[csw-maintainers] Garbage collection in allpkgs

Maciej (Matchek) Bliziński maciej at opencsw.org
Sun Dec 30 19:53:45 CET 2012


2012/12/29 Dagobert Michelsen <dam at opencsw.org>:
> Hi,
>
> Am 29.12.2012 um 15:19 schrieb Peter Bonivart <bonivart at opencsw.org>:
>
>> On Sat, Dec 29, 2012 at 2:54 PM, Maciej (Matchek) Bliziński
>> <maciej at opencsw.org> wrote:
>>> I ran garbage collection in our official catalog. This means that I
>>> removed files that were in allpkgs, but were not used/referenced by
>>> any of our catalogs. I managed to remove about 24GB worth of unused
>>> packages. The package files are not deleted, they are only moved out
>>> of allpkgs.
>>>
>>> Does anyone think we should keep old files forever, that is, keep more
>>> than just what's in our history of releases?
>
> We should keep packages forever in allpkgs. I suggest putting them back.

OK, will do.

>> What catalogs did you match against, current ones or also archived ones?

Current ones.

>> I think many users have looked for that lost package from back in the
>> day, we had one single mirror in Germany that didn't rsync with
>> --delete so they basically archived everything but I don't think it
>> was official and they could stop doing that anytime.
>
> Correct. If archived packages are offered and users consider it useful
> it should be us to offer that.

Agreed.

>> Could we do the
>> same somewhere on the buildfarm but not on the master mirror? Then we
>> would have an official archive for those that need it but since it
>> wouldn't be used that much it would be unnecessary to mirror it, we
>> would just link to it from our mirror page.
>
> This is already the case: allpkgs/ is not included in the main rsync
> offering, just in opencsw-full:
>
>> dam at login [login]:/home/dam > rsync rsync://mirror.opencsw.org
>> csw             Legacy name, please switch to the identical 'opencsw'
>> opencsw         CSW Primary Mirror, use this if you are mirroring OpenCSW (the archive "allpkgs" is now in 'opencsw-full')
>> opencsw-full    CSW Primary Mirror, contains full archive of old packages
>> opencsw-future  The proposed future layout of the OpenCSW Primary, layout may change without notice at any time
>
>
> This is done by using the exclude-directive in rsync.conf for "csw" and "opencsw":
>         exclude = allpkgs HEADER.txt
>
> Having all packages on the primary mirror is also good IMHO. This way
> each downstream-site can easily select what to offer.

I'm not sure what you mean by downstream-sites selecting what to
offer. The primary mirror has a set of files, and that's it. People
can make snapshots from different points in time, is that what you
mean?

Generally, the oldpkgs archive that people often wanted were there
because of the rolling release of the 'current' catalog. At the time,
the thinking was that first we scrutinize the hell of the package, and
once we prove to ourselves it's a good package, we push it forward
with no good way of rolling it back. If we pushed something bad, we
had nothing to say to people other than “scavenge oldpkgs”. These
days, we have the testing release; we know that we will every now and
then push something bad to unstable, and that people can still using
the testing release, and we don't have to panic when there's something
broken in unstable. The assumption there is of course, that majority
of problems will be caught when in unstable.

We also keep the old named releases, for instance the dublin release,
which is a consistent set of packages and their dependencies.

The 'allpkgs' directory does contain a history of packages, but it
doesn't contain all transient (and often broken) 19 different versions
of MySQL that happened to be in unstable for 2 days, but only the 3 or
4 versions that are actually used somewhere in our catalogs. I thought
that was good enough. If you disagree, I will put the 24GB of junk
back in allpkgs; but I remain unconvinced that they are actually
useful.

Maciej


More information about the maintainers mailing list