[csw-maintainers] progress report

Sun Sep 11 15:48:24 CEST 2011

2011/9/11 Ben Walton <bwalton at opencsw.org>:
> Yes, 100% normalization may not be ideal for this as there would be
> several heavily used tables with less than 5 objects stored.

Do you mean dictionary tables?

The main reason for denormalization was this: when William was
designing his package database, he had a number of tables, such as:

- software
- software + version
- software + version + package revision
- software + version + package revision + architecture
- software + version + package revision + architecture + OS release

...and so on.  The more complex the primary key, the more packages in
the table.  To keep this kind of normalized structure, you need to
instantiate a lot of intermediate objects before you can store
information about your svr4 file (I intentionally avoid the term
'package').

In the buildfarm database, the approach is different. There is one
main table with the svr4 files.  Each row has fields: software
version, package revision, architecture, OS release, etc.  There are
indexes for these fields, so if you want to find packages for certain
OS release an architecture, you just specify the right filters in your
SQL query, and lookups should be reasonably fast.

http://paste.pocoo.org/show/474057/

One quirk about the table is that there are two columns referring to
the architecture.  Why is that?  There are two places in a package
that describe the architecture: pkginfo and the file name.  They in
practice do not always agree.  The buildfarm database does not try to
model the perfect world, it aims to reflect the reality, so if
packages in reality sometimes have mismatched information, I want the
database to be able to store that.  Hence, arch_id and
filename_arch_id.

Maciej