[csw-maintainers] Garbage collection in allpkgs

Maciej (Matchek) Bliziński maciej at opencsw.org
Mon Dec 31 14:04:25 CET 2012


2012/12/31 Dagobert Michelsen <dam at opencsw.org>:
> Hi Maciej,
>
> Am 30.12.2012 um 19:53 schrieb Maciej (Matchek) Bliziński <maciej at opencsw.org>:
>> 2012/12/29 Dagobert Michelsen <dam at opencsw.org>:
>>>> Could we do the
>>>> same somewhere on the buildfarm but not on the master mirror? Then we
>>>> would have an official archive for those that need it but since it
>>>> wouldn't be used that much it would be unnecessary to mirror it, we
>>>> would just link to it from our mirror page.
>>>
>>> This is already the case: allpkgs/ is not included in the main rsync
>>> offering, just in opencsw-full:
>>>
>>>> dam at login [login]:/home/dam > rsync rsync://mirror.opencsw.org
>>>> csw             Legacy name, please switch to the identical 'opencsw'
>>>> opencsw         CSW Primary Mirror, use this if you are mirroring OpenCSW (the archive "allpkgs" is now in 'opencsw-full')
>>>> opencsw-full    CSW Primary Mirror, contains full archive of old packages
>>>> opencsw-future  The proposed future layout of the OpenCSW Primary, layout may change without notice at any time
>>>
>>>
>>> This is done by using the exclude-directive in rsync.conf for "csw" and "opencsw":
>>>        exclude = allpkgs HEADER.txt
>>>
>>> Having all packages on the primary mirror is also good IMHO. This way
>>> each downstream-site can easily select what to offer.
>>
>> I'm not sure what you mean by downstream-sites selecting what to
>> offer.
>
> Official sites mirroring our packages.

I was asking what do you mean by selecting what to offer. Downstream
sites I understand.

>> The primary mirror has a set of files, and that's it.
>
> Not quite. There are all files in the filesystem avaialable for download.
> However, if you rsync "opencsw" you won't get allpkgs/, so almost all
> of the official mirror sites don't mirror allpkgs/.

So there's a set of file that you get when you rsync and that's it. If
you rsync, you don't get to choose, you get what you it's given to
you. If you don't, then you're not a full mirror.

>> People
>> can make snapshots from different points in time, is that what you
>> mean?
>
> No, that is different. We don't do this ATM, but archive catalog-files,
> so if someone has a specific problem we can regenerate everything
> from that catalog and allpkgs/ and this is another reason why I think
> having allpkgs is a Good Thing™.

Did we ever do that? Did we even exercise doing this? I think that in
a real situation we would do something else rather than recreating a
past catalog. Do you have a specific scenario in mind? For example, we
have a bad, I don't know, MySQL. What would make us recreate an old
catalog from archives, rather than selectively solving the problem at
hand?

>> Generally, the oldpkgs archive that people often wanted were there
>> because of the rolling release of the 'current' catalog. At the time,
>> the thinking was that first we scrutinize the hell of the package, and
>> once we prove to ourselves it's a good package, we push it forward
>> with no good way of rolling it back. If we pushed something bad, we
>> had nothing to say to people other than “scavenge oldpkgs”. These
>> days, we have the testing release; we know that we will every now and
>> then push something bad to unstable, and that people can still using
>> the testing release, and we don't have to panic when there's something
>> broken in unstable. The assumption there is of course, that majority
>> of problems will be caught when in unstable.
>
> Right.
>
>> We also keep the old named releases, for instance the dublin release,
>> which is a consistent set of packages and their dependencies.
>
> Yes.
>
>> The 'allpkgs' directory does contain a history of packages, but it
>> doesn't contain all transient (and often broken) 19 different versions
>> of MySQL that happened to be in unstable for 2 days, but only the 3 or
>> 4 versions that are actually used somewhere in our catalogs.
>
> If it would be only for this we wouldn't need allpkgs at all.
>
>> I thought
>> that was good enough. If you disagree, I will put the 24GB of junk
>> back in allpkgs; but I remain unconvinced that they are actually
>> useful.
>
>
> Hopefully I convinced you with the above arguments.

I see some point in keeping old packages for some time. We had one
case in which we needed to restore a version of gcc from allpkgs,
because the version from dublin was too old and the version from
unstable was still problematic. But I still don't see a point in
keeping them forever.

We should focus on realistic scenarios, either ones that we actually
performed (like the gcc restore) or ones that we anticipate and are
able to exercise.

Maciej


More information about the maintainers mailing list