[csw-maintainers] robots.txt

Trygve Laugstøl trygvis at opencsw.org
Mon Aug 31 23:15:48 CEST 2009


Philip Brown wrote:
> I am against putting in "limit" directives, unless there is a very
> clear, specific benefit.
> Unless i missed something, you mentioned only general case things, and
> a potential of broken robots.
> 
> Any robot that cant figure out to properly prune   /page  vs
> /page.php, in this day and age, is a broken robot.
> We should not modify our configs to make dumb robots look better than they are.

Actually, they're not. There's no way for the engines out there to know 
that "packages" is the same resource as "packages.php" (it can do some 
magic, and I'm sure they do when they're doing page ranking). However, 
they will still list both as hits, try to seach for "site:opencsw.org 
CSWbash packages" on google and you'll see.

Same goes for misconfigured sites where the same content is shown under 
different domains (for a current example, see http://git.opencsw.org vs 
http://mirrors.opencsw.org)

Being a good web citizen is important if you want to get the best result 
for the users and those URLs are actually quite important. For the full 
resources read up on REST and HTTP.

--
Trygve




More information about the maintainers mailing list