Using conditional GET or wget timestamping for the catalog files

Dagobert Michelsen dam at opencsw.org
Mon Oct 28 13:40:04 CET 2013


Hi Peter,

Am 28.10.2013 um 13:34 schrieb Peter Bonivart <bonivart at opencsw.org>:
> On Mon, Oct 28, 2013 at 11:05 AM, Dagobert Michelsen <dam at opencsw.org> wrote:
>> Am 28.10.2013 um 10:37 schrieb Maciej (Matchek) Bliziński <maciej at opencsw.org>:
>>> Hey Peter (B) and maintainers,
>>> 
>>> I spoke to Dago a few days ago, and we had a chat about a large portion of traffic from our main mirror being just the catalog files, that is, the files named 'catalog' that are downloaded and re-downloaded a countless number of times. The mirror can withstand it, but it's a constant stream of a few megabytes per second, day and night.
>> 
>> Some numbers: we have constantly 3-4 MB per second. This is not a problem ATM as we
>> have a direct gigabit uplink to the internet, but summing this up it is roughly
>> 10 TB. Just as a comparison: Amazon would charge $0,120 per GB resulting in 1200$ !!
>> So I would like to take the initiative and see that we save bandwidth now that we still
>> have the cheap mirror.
>> 
>>> Perhaps this can be helped by using the conditional GET with the possible HTTP 304 Not Modified response, or timestamping. wget has an option to timestamp files, and it can issue just a HEAD request to skip downloading the whole file. Here's some information I found:
>>> 
>>> http://www.gnu.org/software/wget/manual/wget.html#Time_002dStamping
>>> 
>>> Have we considered this in the past? I don't recall it. Maybe we should take a look, it could be simple to implement, and we would save some bandwidth on our main mirror and on other mirrors worldwide.
>> 
>> Just adding --timestamping would already be a great benefit.
>> 
>> Peter, what do you think?
> 
> I could do some tests I guess. What I did was to make the default for
> expired catalogs 14 days but I think most people add -U to their
> command line all the time.

I think the main problem is with our puppet provider, maybe it should be changed there.
When I talked to some downstream user they used 10 minute updates with -U for each client,
10 servers with 20 zones each = 200 downloads every 10 minutes = every 3 seconds one download
for just one "customer"!

> Is timestamping available in our old static wget binaries (those I
> distribute with pkgutil as a last resort)?


Probably, timestamping is a pretty basic feature based on HTTP header.


Best regards

  -- Dago

-- 
"You don't become great by trying to be great, you become great by wanting to do something,
and then doing it so hard that you become great in the process." - xkcd #896

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2351 bytes
Desc: not available
URL: <http://lists.opencsw.org/pipermail/maintainers/attachments/20131028/29eba38f/attachment.p7s>


More information about the maintainers mailing list