[csw-maintainers] robots.txt

Trygve Laugstøl trygvis at opencsw.org
Tue Sep 1 01:47:59 CEST 2009

Philip Brown wrote:
> 2009/8/31 Trygve Laugstøl <trygvis at opencsw.org>:
>> Philip Brown wrote:
>> The fact that they list both is to me a
>> sign of that they store both. How would they determine the canonical one
>> anyway?
> I would say that the common-sense answer is, if "page", and
> "page.anything" exists (and the content is the same), then "page"
> should always be canonical.
> after all, if "site.com/docs" exists, and "site.com/docs.php"
> exists... odds are that "site.com/docs" will ALWAYS exist.. but
> someday, they may change the backend to be .pl, or .cgi, or
> whoknowswhat.
>>> Hence I still stand by my statement that intelligent search engines
>>> already handle this sort of thing "properly".
>>> It is more appropriate for us fix our internal links, rather than tell
>>> search engines, "ignore our mangled links" !
>> I think that in this case fixing the references is the right solution.
> I'm glad we agree there. The annoying thing is that I'm not sure where
> the references are, at this point. From my searching through the
> pages, I dont see any obvious references to ".php" in our site. So
> assistance from other folks in hunting those down, would be
> appreciated.

It very well might be old links that has gotten index at some point in 
time, and since we still return 200 OK on those URLs, Google will keep 
on re-indexing them. Try adding a permanent redirect from "page.php" to 
"page" and they will disappear after a while.

Another option might be to grep the logs for ".php" and check the 
referer field if that is logged. Might give you a clue as well.

This is where the Webmaster toolkit comes in handy, it can show you how 
Google view your side (not that I've used it myself, but that's what 
I've been told).


More information about the maintainers mailing list