[csw-maintainers] web visibility of packages

Ben Walton bwalton at opencsw.org
Mon Aug 8 03:02:48 CEST 2011


Excerpts from Maciej Bliziński's message of Sun Aug 07 05:38:27 -0400 2011:


> I'd suggest one modification: Instead of making it part of pkgdb
> (the backend of csw-upload-pkg), it could be a separate program
> which would reach out to both pkgdb and mantis for information, and
> that could be run asynchronously to package uploads.  This way, each
> component would be itself simpler, which is a point for
> maintainability.

This is workable, but is asynchronous action desirable?  If I upload a
new package to unstable, I'd like it to be visible right away in
mantis and on the web.

It is a nice point that it wouldn't require modification of pkgdb and
thus increases the modularity and (decreases the coupling) of things.
As long as it ran reasonably freqently (maybe tied to the catalog
updating?) the delay between push and publish wouldn't be all that
bad.

I think I like Dago's leanings in the mantis area...screw the api.
Hit the db directly.  That's what the old scripts were doing and Dago
indicates that it isn't too bad to decipher the table structure.

> Looking at pkgdb_web.py, you can see this line:
> 
> r'/rest/srv4/([0-9a-f]{32})/', 'RestSrv4Detail',

This is quite easy to parse and understand.

> We can add more fields to the JSON structure, or add more URLs.
> This RESTful interface has been written by my pretty much solo, and
> has not been reviewed.  I only implemented as much as I needed to
> get csw-upload-pkg to work.  I'll be happy to receive comments on
> how to develop it further.

It looks reasonable for data extraction, so that's enough to start
with.  If it requires modification for some reason, we can address
that.

A further advantage of this design is that there would be no daemon
requirement.  A simple cron job would suffice.

So, that would leave us with a job that:

* Pulls each catalog from pkgdb every period.
* If the entry exists in mantis, ensure the owner is correct.
* If the entry doesn't exist, add it.
* Do the same for the web package database.

The owner check will see each package queried (based on the md5sum in
the catalog) so the number of queries will grow linearly with the
catalog size.  Maybe we should only do owner checks daily or weekly
instead of at every run?

The package rename problem stems from the fact that we use catalog
name as the primary key in mantis but package name most everywhere
else.  This leads me to think about package removals...a rename in
pkgdb is a remove and then an add.  If this maintenance were tied
synchronously to pkgdb interaction, renames are no longer a problem
for mantis maintenance.

Sticking with the asynchronous model though, removals aren't bad.
We'd need to extract the full list of packages from mantis and compare
to the catalog, but that's pretty easy.

I think that if we start at the mirror and use the generated catalogs
from bldcat, we get the perfect starting point to ensure mantis and
the web db are up to date...

Before I break ground and start writing code, does this sound ok to
everyone?  Any obvious gotchas here?

Thanks
-Ben
--
Ben Walton
Systems Programmer - CHASS
University of Toronto
C:416.407.5610 | W:416.978.4302



More information about the maintainers mailing list