[csw-maintainers] robots.txt

Philip Brown phil at bolthole.com
Mon Aug 31 23:37:11 CEST 2009


2009/8/31 Trygve Laugstøl <trygvis at opencsw.org>:
> Philip Brown wrote:
>>
>> Any robot that cant figure out to properly prune   /page  vs
>> /page.php, in this day and age, is a broken robot.
>> We should not modify our configs to make dumb robots look better than they
>> are.
>
> Actually, they're not. There's no way for the engines out there to know that
> "packages" is the same resource as "packages.php" (it can do some magic, and
> I'm sure they do when they're doing page ranking). However, they will still
> list both as hits, try to seach for "site:opencsw.org CSWbash packages" on
> google and you'll see.

Err.. that search phrase gives a very long ... aha. now I see.

you are referring, presumably, to the #1, and #3 results.

However, this rather underlines my area of concern.

They are referenced in quite different ways.   "packages/CSWbash", vs
"packages.php/bash".
Which hints that google "found" the pages in different ways. Which
hints that somewhere (probably on OUR site), there is something
incorrectly referencing  "packages.php/bash" instead of
"packages/bash".

I think we should fix those out-of-date reference styles, before
making anything canonical.

ALSO: the two results, were from DIFFERENT TIMES. They are actually
"different" pages! If in contrast, the output was the same, google
would probably have coalesced them into one, I would bet.

Hence I still stand by my statement that intelligent search engines
already handle this sort of thing "properly".
It is more appropriate for us fix our internal links, rather than tell
search engines, "ignore our mangled links" !



More information about the maintainers mailing list