[csw-maintainers] robots.txt

Philip Brown phil at bolthole.com
Tue Sep 1 01:17:52 CEST 2009


2009/8/31 Trygve Laugstøl <trygvis at opencsw.org>:
> Philip Brown wrote:
>The fact that they list both is to me a
> sign of that they store both. How would they determine the canonical one
> anyway?

I would say that the common-sense answer is, if "page", and
"page.anything" exists (and the content is the same), then "page"
should always be canonical.

after all, if "site.com/docs" exists, and "site.com/docs.php"
exists... odds are that "site.com/docs" will ALWAYS exist.. but
someday, they may change the backend to be .pl, or .cgi, or
whoknowswhat.




>
>> Hence I still stand by my statement that intelligent search engines
>> already handle this sort of thing "properly".
>> It is more appropriate for us fix our internal links, rather than tell
>> search engines, "ignore our mangled links" !
>
> I think that in this case fixing the references is the right solution.


I'm glad we agree there. The annoying thing is that I'm not sure where
the references are, at this point. From my searching through the
pages, I dont see any obvious references to ".php" in our site. So
assistance from other folks in hunting those down, would be
appreciated.



More information about the maintainers mailing list