[csw-maintainers] web visibility of packages

Mon Aug 8 11:13:43 CEST 2011

2011/8/8 Ben Walton <bwalton at opencsw.org>:
> Excerpts from Maciej Bliziński's message of Sun Aug 07 05:38:27 -0400 2011:
>
>
>> I'd suggest one modification: Instead of making it part of pkgdb
>> (the backend of csw-upload-pkg), it could be a separate program
>> which would reach out to both pkgdb and mantis for information, and
>> that could be run asynchronously to package uploads.  This way, each
>> component would be itself simpler, which is a point for
>> maintainability.
>
> This is workable, but is asynchronous action desirable?  If I upload a
> new package to unstable, I'd like it to be visible right away in
> mantis and on the web.

It could be a matter of setting the expectations right.  Currently,
you run the utility, and you wait up to 3h for the confirmation email.
 It's reasonable to have a similarly working mantis integration too.

> It is a nice point that it wouldn't require modification of pkgdb and
> thus increases the modularity and (decreases the coupling) of things.
> As long as it ran reasonably freqently (maybe tied to the catalog
> updating?) the delay between push and publish wouldn't be all that
> bad.

Yes, it makes sense to run it right after the existing db→disk integration.

> I think I like Dago's leanings in the mantis area...screw the api.
> Hit the db directly.  That's what the old scripts were doing and Dago
> indicates that it isn't too bad to decipher the table structure.

Fair enough. We'll do that.

> So, that would leave us with a job that:
>
> * Pulls each catalog from pkgdb every period.

The assumption behind the way mantis integration was handled in the
past, is that there is only one catalog ('current').  What if we had
e.g. package foo, released by maintainer A to testing and by
maintainer B to unstable?  Perhaps we should for now assume that there
is only one catalog ('unstable'), against which bugs are filed.  Even
if maintainer A is the person who released the package to testing,
it's the maintainer B who is the person actively working on the
package.

> * If the entry exists in mantis, ensure the owner is correct.
> * If the entry doesn't exist, add it.
> * Do the same for the web package database.
>
> The owner check will see each package queried (based on the md5sum in
> the catalog) so the number of queries will grow linearly with the
> catalog size.  Maybe we should only do owner checks daily or weekly
> instead of at every run?

The integration job could keep a cache of the md5sum→maintainer
mapping and only query the db for new packages.

> The package rename problem stems from the fact that we use catalog
> name as the primary key in mantis but package name most everywhere
> else.  This leads me to think about package removals...a rename in
> pkgdb is a remove and then an add.  If this maintenance were tied
> synchronously to pkgdb interaction, renames are no longer a problem
> for mantis maintenance.

csw-upload-pkg doesn't understand the notion of a rename, so this
information couldn't be used in the mantis integration.

> Sticking with the asynchronous model though, removals aren't bad.
> We'd need to extract the full list of packages from mantis and compare
> to the catalog, but that's pretty easy.

We could start with just displaying the information what would needed
to be done for the two to match.

> I think that if we start at the mirror and use the generated catalogs
> from bldcat, we get the perfect starting point to ensure mantis and
> the web db are up to date...

+1

> Before I break ground and start writing code, does this sound ok to
> everyone?  Any obvious gotchas here?

You are thinking about writing the buildfarm→package_db integration, right?

Maciej