[csw-maintainers] Garbage collection in allpkgs
Dagobert Michelsen
dam at opencsw.org
Mon Dec 31 13:23:14 CET 2012
Hi Maciej,
Am 30.12.2012 um 19:53 schrieb Maciej (Matchek) Bliziński <maciej at opencsw.org>:
> 2012/12/29 Dagobert Michelsen <dam at opencsw.org>:
>>> Could we do the
>>> same somewhere on the buildfarm but not on the master mirror? Then we
>>> would have an official archive for those that need it but since it
>>> wouldn't be used that much it would be unnecessary to mirror it, we
>>> would just link to it from our mirror page.
>>
>> This is already the case: allpkgs/ is not included in the main rsync
>> offering, just in opencsw-full:
>>
>>> dam at login [login]:/home/dam > rsync rsync://mirror.opencsw.org
>>> csw Legacy name, please switch to the identical 'opencsw'
>>> opencsw CSW Primary Mirror, use this if you are mirroring OpenCSW (the archive "allpkgs" is now in 'opencsw-full')
>>> opencsw-full CSW Primary Mirror, contains full archive of old packages
>>> opencsw-future The proposed future layout of the OpenCSW Primary, layout may change without notice at any time
>>
>>
>> This is done by using the exclude-directive in rsync.conf for "csw" and "opencsw":
>> exclude = allpkgs HEADER.txt
>>
>> Having all packages on the primary mirror is also good IMHO. This way
>> each downstream-site can easily select what to offer.
>
> I'm not sure what you mean by downstream-sites selecting what to
> offer.
Official sites mirroring our packages.
> The primary mirror has a set of files, and that's it.
Not quite. There are all files in the filesystem avaialable for download.
However, if you rsync "opencsw" you won't get allpkgs/, so almost all
of the official mirror sites don't mirror allpkgs/.
> People
> can make snapshots from different points in time, is that what you
> mean?
No, that is different. We don't do this ATM, but archive catalog-files,
so if someone has a specific problem we can regenerate everything
from that catalog and allpkgs/ and this is another reason why I think
having allpkgs is a Good Thing™.
> Generally, the oldpkgs archive that people often wanted were there
> because of the rolling release of the 'current' catalog. At the time,
> the thinking was that first we scrutinize the hell of the package, and
> once we prove to ourselves it's a good package, we push it forward
> with no good way of rolling it back. If we pushed something bad, we
> had nothing to say to people other than “scavenge oldpkgs”. These
> days, we have the testing release; we know that we will every now and
> then push something bad to unstable, and that people can still using
> the testing release, and we don't have to panic when there's something
> broken in unstable. The assumption there is of course, that majority
> of problems will be caught when in unstable.
Right.
> We also keep the old named releases, for instance the dublin release,
> which is a consistent set of packages and their dependencies.
Yes.
> The 'allpkgs' directory does contain a history of packages, but it
> doesn't contain all transient (and often broken) 19 different versions
> of MySQL that happened to be in unstable for 2 days, but only the 3 or
> 4 versions that are actually used somewhere in our catalogs.
If it would be only for this we wouldn't need allpkgs at all.
> I thought
> that was good enough. If you disagree, I will put the 24GB of junk
> back in allpkgs; but I remain unconvinced that they are actually
> useful.
Hopefully I convinced you with the above arguments.
Best regards
-- Dago
--
"You don't become great by trying to be great, you become great by wanting to do something,
and then doing it so hard that you become great in the process." - xkcd #896
More information about the maintainers
mailing list