[csw-maintainers] Garbage collection in allpkgs

Dagobert Michelsen dam at opencsw.org
Mon Dec 31 13:23:14 CET 2012


Hi Maciej,

Am 30.12.2012 um 19:53 schrieb Maciej (Matchek) Bliziński <maciej at opencsw.org>:
> 2012/12/29 Dagobert Michelsen <dam at opencsw.org>:
>>> Could we do the
>>> same somewhere on the buildfarm but not on the master mirror? Then we
>>> would have an official archive for those that need it but since it
>>> wouldn't be used that much it would be unnecessary to mirror it, we
>>> would just link to it from our mirror page.
>> 
>> This is already the case: allpkgs/ is not included in the main rsync
>> offering, just in opencsw-full:
>> 
>>> dam at login [login]:/home/dam > rsync rsync://mirror.opencsw.org
>>> csw             Legacy name, please switch to the identical 'opencsw'
>>> opencsw         CSW Primary Mirror, use this if you are mirroring OpenCSW (the archive "allpkgs" is now in 'opencsw-full')
>>> opencsw-full    CSW Primary Mirror, contains full archive of old packages
>>> opencsw-future  The proposed future layout of the OpenCSW Primary, layout may change without notice at any time
>> 
>> 
>> This is done by using the exclude-directive in rsync.conf for "csw" and "opencsw":
>>        exclude = allpkgs HEADER.txt
>> 
>> Having all packages on the primary mirror is also good IMHO. This way
>> each downstream-site can easily select what to offer.
> 
> I'm not sure what you mean by downstream-sites selecting what to
> offer.

Official sites mirroring our packages.

> The primary mirror has a set of files, and that's it.

Not quite. There are all files in the filesystem avaialable for download.
However, if you rsync "opencsw" you won't get allpkgs/, so almost all
of the official mirror sites don't mirror allpkgs/.

> People
> can make snapshots from different points in time, is that what you
> mean?

No, that is different. We don't do this ATM, but archive catalog-files,
so if someone has a specific problem we can regenerate everything
from that catalog and allpkgs/ and this is another reason why I think
having allpkgs is a Good Thing™.

> Generally, the oldpkgs archive that people often wanted were there
> because of the rolling release of the 'current' catalog. At the time,
> the thinking was that first we scrutinize the hell of the package, and
> once we prove to ourselves it's a good package, we push it forward
> with no good way of rolling it back. If we pushed something bad, we
> had nothing to say to people other than “scavenge oldpkgs”. These
> days, we have the testing release; we know that we will every now and
> then push something bad to unstable, and that people can still using
> the testing release, and we don't have to panic when there's something
> broken in unstable. The assumption there is of course, that majority
> of problems will be caught when in unstable.

Right.

> We also keep the old named releases, for instance the dublin release,
> which is a consistent set of packages and their dependencies.

Yes.

> The 'allpkgs' directory does contain a history of packages, but it
> doesn't contain all transient (and often broken) 19 different versions
> of MySQL that happened to be in unstable for 2 days, but only the 3 or
> 4 versions that are actually used somewhere in our catalogs.

If it would be only for this we wouldn't need allpkgs at all.

> I thought
> that was good enough. If you disagree, I will put the 24GB of junk
> back in allpkgs; but I remain unconvinced that they are actually
> useful.


Hopefully I convinced you with the above arguments.


Best regards

  -- Dago

-- 
"You don't become great by trying to be great, you become great by wanting to do something,
and then doing it so hard that you become great in the process." - xkcd #896



More information about the maintainers mailing list