[csw-users] Catalog Still BROKEN (in a new way)

Dennis Clarke blastwave at gmail.com
Fri Jun 6 15:39:06 CEST 2008


On Fri, Jun 6, 2008 at 6:15 AM, Alessio <a.cervellin at acm.org> wrote:
> The catalog should be finally fully fixed today...
>
> try again and let us know, thanks
>

This is something that I do independantly

I have scripts that checks the catalog.

What it does is this :

1 ) first it generates an output file with the md5 hash of every file
in a given directory
       The output file has a valid "package triplet" which is the
       filename and then the md5 hash and then the file size in bytes.
2 ) then it checks to see if there is 1 entry and only 1 entry in the
catalog for each package triplet
        hash and filename and file size data entry
3 ) the output from (2) is expected to produce output where there are
zero entries in the
       catalog for a given md5hash because files are often
pre-released before the
       new catalog is pushed out. That is fine. However there should
be no output where
       there are more than 1 entries in the catalog for a given md5
hash or triplet.
4 ) then I check in reverse to see if every entry in the catalog with
a md5 hash also exists
       as a file ( or link ) in that directory.  If there is anything
missing there is a problem.

Let me show you by example.

Suppose we have the sparc architecture catalog for Solaris 8 and we cd
to the correct directory where those packages and catalog and
descriptions files reside.

first I look where I am for files there are NOT packages.
# pwd
/CSW/unstable/sparc/5.8
# find . -type f | grep -v "CSW\.pkg"
./catalog.old
./descriptions.old
./descriptions
./catalog
#

I do a common sense check to see what number of signatures I should
have from step (1)
# find . -type f | grep -c "CSW\.pkg"
1805

I then generate the md5 hashs :

# /root/bin/step1_gen_sig.sh

Which results in :

# wc -l /tmp/package_sigs.txt
    1805 /tmp/package_sigs.txt

Which has contents in three fields; filename, md5hash and filesize in bytes.
That data, in simple ASCII fields separated by spaces is what I call a
"package triplet".

# grep gzip /tmp/package_sigs.txt
gzip-1.3.12,REV=2008.01.03-SunOS5.8-sparc-CSW.pkg
87e1279512a6f9def6be660b9325522e 355840
mod_gzip-1.3.26.1,REV=2003.07.07.a-SunOS5.8-sparc-CSW.pkg.gz
63007d4df538ae8c68fe5050107703de 108088
pkgzip-1.2-all-CSW.pkg.gz 1a8ac5b5bf63c4d8d06c2b2b17df75cb 6284

Logic dictates that there should be an entry in the catalog with the
exact same data, character for character and not just for the md5
hashs :

# grep -c 1a8ac5b5bf63c4d8d06c2b2b17df75cb catalog
1
# grep -c "pkgzip-1.2-all-CSW.pkg.gz 1a8ac5b5bf63c4d8d06c2b2b17df75cb
6284" catalog
1

So there you see that the catalog has valid data ( at least for those
three fields ) for the package pkgzip-1.2.

So in step (2) we need to check for each and every package signature triplet.

# /root/bin/step2_verify_sigs_in_catalog.sh
#

No output is perfect.

That script ( step 2 ) has an intermediate step in which an output
file is created. The output looks like so :

# /tmp/sig_count.sh | head -600 | tail -4
1 68bb1fdb7723c0865530acd3bdd04e5c
javasvn-1.4.5,REV=2007.11.18-SunOS5.8-sparc-CSW.pkg.gz
1 516c3bc67165a33053604f4bcb5c0116
jbig2dec-0.9,REV=2007.05.26-SunOS5.8-sparc-CSW.pkg.gz
1 903c3b5d4aac4e61da40e3cbd9ac409e
jbigkit-1.6,REV=2007.05.25-SunOS5.8-sparc-CSW.pkg.gz
1 b8e39bf5c8c679728eae3f35bf64ebcc
jboss3-3.2.6,REV=2005.02.03-SunOS5.8-all-CSW.pkg.gz

See that leading digit "1" there ?  That means that the package
triplet exists once and only once in the catalog and that is a good
thing.  Guess what, a zero is okay too. That just means that some
package has been pushed out but it is not in the catalog yet.

Next comes the reverse check in which each md5hash in the catalog, in
fact, the package triplet data is checked to exist in the output from
step (1). That ensures that every single entry in the catalog actually
exists as a file.

This also creates an intermediate output file that looks like so :

# /tmp/sigger.sh | head -600 | tail -4
1 68bb1fdb7723c0865530acd3bdd04e5c
javasvn-1.4.5,REV=2007.11.18-SunOS5.8-sparc-CSW.pkg.gz
1 516c3bc67165a33053604f4bcb5c0116
jbig2dec-0.9,REV=2007.05.26-SunOS5.8-sparc-CSW.pkg.gz
1 903c3b5d4aac4e61da40e3cbd9ac409e
jbigkit-1.6,REV=2007.05.25-SunOS5.8-sparc-CSW.pkg.gz
1 b8e39bf5c8c679728eae3f35bf64ebcc
jboss3-3.2.6,REV=2005.02.03-SunOS5.8-all-CSW.pkg.gz

That should look similar, exactly, to the above data.  Again the
leading digit "1" shows me that the entries in the catalog actually
exist as rela package files. Once and once only. Any other result
would be flagged.

# /root/bin/step3_verify_sigs_in_catalog_exist_as_files.sh
#

No output .. that is a perfect catalog.

I think that is a pretty stringent test of the catalog and do you
think I missed anything?

Dennis



More information about the users mailing list