[csw-maintainers] robots.txt
Trygve Laugstøl
trygvis at opencsw.org
Mon Aug 31 23:15:48 CEST 2009
Philip Brown wrote:
> I am against putting in "limit" directives, unless there is a very
> clear, specific benefit.
> Unless i missed something, you mentioned only general case things, and
> a potential of broken robots.
>
> Any robot that cant figure out to properly prune /page vs
> /page.php, in this day and age, is a broken robot.
> We should not modify our configs to make dumb robots look better than they are.
Actually, they're not. There's no way for the engines out there to know
that "packages" is the same resource as "packages.php" (it can do some
magic, and I'm sure they do when they're doing page ranking). However,
they will still list both as hits, try to seach for "site:opencsw.org
CSWbash packages" on google and you'll see.
Same goes for misconfigured sites where the same content is shown under
different domains (for a current example, see http://git.opencsw.org vs
http://mirrors.opencsw.org)
Being a good web citizen is important if you want to get the best result
for the users and those URLs are actually quite important. For the full
resources read up on REST and HTTP.
--
Trygve
More information about the maintainers
mailing list