[csw-maintainers] upcoming extention to catalog format: SITEFEATURES

James Lee james at opencsw.org
Sat Oct 23 11:57:09 CEST 2010

On 22/10/10, 16:35:12, Philip Brown <phil at bolthole.com> wrote regarding Re:
[csw-maintainers] upcoming extention to catalog format: SITEFEATURES:

> > So the tools, at first use, download the plain catalog file and for
> > that run use it.  At the next run, it peeks inside the existing
> > catalog before fetching the new one?
> >
> > Wouldn't it be nicer to just gzip the catalog as the default and
> > provide the uncompressed version for sites that haven't updated
> > pkgutil/pkg-get yet?

> well, for OUR site, both are going to be there from now on, so its not
> an either/or thing.

> > Alternately, the tools could do:
> >
> > if gunzip is available and the .gz file is on the mirror, pull it.
> > else grab the plain file

> That is one approach. I'm not mandating behaviour of other peoples'
> tools. However, I personally recommend against it.

Seems perfect to me.  Something like:

wget -q ${SITE}/catalog.gz || wget -q ${SITE}/catalog
[ -e catalog.gz ] && gunzip catalog.gz

> Firstly, it's rude to go leaving a bunch of "ERROR: file not found"
> log messages on someone else's server.

You are over sensitive.  Tell this to the browser people about
favicon.ico, the phone people about apple-touch-icon.png, major robots
that don't learn that you have deleted files, or sites that don't update
links when you move pages.

If you are bothered you can learn after one probe about a site and store
the value.

I'm not sure I follow the example in this proposal, to put the gz value
in the file makes no sense because by the time you have the file you
aren't bothered what you should have done, it's too late.  You can save
it for next time although it might have changed by then.  Downloading a
hints file takes longer than probing and failing and getting the truth.
If you are worried about 404s then you won't want to get a hints file
that isn't on someone else's mirrors.

Alternative method, try sending some headers:
$ wget --header 'Accept-Encoding: gzip' ...
$ file catalog
catalog:        gzip compressed data - deflate method

and try an "If-Modified-Since: ..." too to save downloading at all.


More information about the maintainers mailing list