[csw-maintainers] Checkpkg database update and new checkpkg tests

Maciej (Matchek) Bliziński maciej at opencsw.org
Fri Jan 18 15:46:40 CET 2013


Package metadata size stored in the database are now roughly 3 times
bigger. The local cache in my home directory has now 2.3GB:

maciej at login [login]:~/src/opencsw-gar/v2 > ls -lh *.db
-rw-r--r--   1 maciej   csw          62K Jan 14 10:05 pkg.db
-rw-r--r--   1 maciej   csw         3.0M Jan 18 02:03 pkgstats-deps.db
-rw-r--r--   1 maciej   csw         2.3G Jan 18 02:03 pkgstats.db

We're seeing other problems such as timeouts on the restful interface,
out-of-memory errors and quota hits. We have to think if this is an
acceptable performance on our buildfarm. There's a number of things that we
can do to make things better, but since they all have to do with
infrastructural changes, they have to be done carefully.

I'm on vacations the next week, so I won't do any changes now or next week.
The first realistic data I'm doing anything more involved is around the 9th
of February, which is roughly 3 weeks from now. For now, let's talk about
what options we have.

- change from pickles to json as the mysql-side storage format (should
speed up the restful interface, no more unpickling+jsonizing)
- storing compressed json is also an option, but will require constant
decoding, which can be slow on sparc. We'd have to do some measurements to
see what's faster; are we I/O bound or processor bound. We could use a
lightweight compression library such as snappy.
http://code.google.com/p/snappy/
- split out the single blob with everything (per package) into more blobs
per package (quite a bit of coding will be involved; requires the rest
interface changes); smaller blobs will be easier to handle and we'll avoid
retrieving data we don't want; the price we'll pay is more complexity in
the code. I've been so far avoiding complexity as much as possible
(contrary to what it might look like :-P ), and having one memory structure
for all data (per package) allowed us to keep the code simple in a lot of
places.
- change the underlying storage, although this would be quite involved and
the benefit is not certain; I read that using MySQL as a key-value store
can work just fine.

Maciej
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opencsw.org/pipermail/maintainers/attachments/20130118/96d0ed9c/attachment.html>


More information about the maintainers mailing list