<div dir="ltr"><div style>Package metadata size stored in the database are now roughly 3 times bigger. The local cache in my home directory has now 2.3GB:</div><div><br></div><div>maciej@login [login]:~/src/opencsw-gar/v2 > ls -lh *.db</div>
<div>-rw-r--r-- 1 maciej csw 62K Jan 14 10:05 pkg.db</div><div>-rw-r--r-- 1 maciej csw 3.0M Jan 18 02:03 pkgstats-deps.db</div><div>-rw-r--r-- 1 maciej csw 2.3G Jan 18 02:03 pkgstats.db</div>
<div class="gmail_extra"><br></div><div class="gmail_extra" style>We're seeing other problems such as timeouts on the restful interface, out-of-memory errors and quota hits. We have to think if this is an acceptable performance on our buildfarm. There's a number of things that we can do to make things better, but since they all have to do with infrastructural changes, they have to be done carefully.</div>
<div class="gmail_extra" style><br></div><div class="gmail_extra" style>I'm on vacations the next week, so I won't do any changes now or next week. The first realistic data I'm doing anything more involved is around the 9th of February, which is roughly 3 weeks from now. For now, let's talk about what options we have.</div>
<div class="gmail_extra" style><br></div><div class="gmail_extra" style>- change from pickles to json as the mysql-side storage format (should speed up the restful interface, no more unpickling+jsonizing)</div><div class="gmail_extra" style>
- storing compressed json is also an option, but will require constant decoding, which can be slow on sparc. We'd have to do some measurements to see what's faster; are we I/O bound or processor bound. We could use a lightweight compression library such as snappy. <a href="http://code.google.com/p/snappy/">http://code.google.com/p/snappy/</a></div>
<div class="gmail_extra" style>- split out the single blob with everything (per package) into more blobs per package (quite a bit of coding will be involved; requires the rest interface changes); smaller blobs will be easier to handle and we'll avoid retrieving data we don't want; the price we'll pay is more complexity in the code. I've been so far avoiding complexity as much as possible (contrary to what it might look like :-P ), and having one memory structure for all data (per package) allowed us to keep the code simple in a lot of places.</div>
<div class="gmail_extra" style>- change the underlying storage, although this would be quite involved and the benefit is not certain; I read that using MySQL as a key-value store can work just fine.</div><div class="gmail_extra" style>
<br></div><div class="gmail_extra" style>Maciej</div></div>