[csw-maintainers] robots.txt

Trygve Laugstøl trygvis at opencsw.org
Tue Sep 1 01:06:35 CEST 2009


Philip Brown wrote:
> 2009/8/31 Trygve Laugstøl <trygvis at opencsw.org>:
>> Philip Brown wrote:
>>> Any robot that cant figure out to properly prune   /page  vs
>>> /page.php, in this day and age, is a broken robot.
>>> We should not modify our configs to make dumb robots look better than they
>>> are.
>> Actually, they're not. There's no way for the engines out there to know that
>> "packages" is the same resource as "packages.php" (it can do some magic, and
>> I'm sure they do when they're doing page ranking). However, they will still
>> list both as hits, try to seach for "site:opencsw.org CSWbash packages" on
>> google and you'll see.
> 
> Err.. that search phrase gives a very long ... aha. now I see.
> 
> you are referring, presumably, to the #1, and #3 results.

Not sure, the search result is most likely different for you and me, but 
yes, there where references to most combinations of the page.

> However, this rather underlines my area of concern.
> 
> They are referenced in quite different ways.   "packages/CSWbash", vs
> "packages.php/bash".
> Which hints that google "found" the pages in different ways. Which
> hints that somewhere (probably on OUR site), there is something
> incorrectly referencing  "packages.php/bash" instead of
> "packages/bash".
> 
> I think we should fix those out-of-date reference styles, before
> making anything canonical.

Fixing references is always nice, but adding stuff (be it Apache config 
or PHP code) to either make sure that the ".php" variant is never used 
is also required. Now that the links exist, a permanent redirect would 
be the correct solution.

> ALSO: the two results, were from DIFFERENT TIMES. They are actually
> "different" pages! If in contrast, the output was the same, google
> would probably have coalesced them into one, I would bet.

I doubt that they will as the number of candidate pages is quite big if 
they first where to go down that path. The fact that they list both is 
to me a sign of that they store both. How would they determine the 
canonical one anyway?

> Hence I still stand by my statement that intelligent search engines
> already handle this sort of thing "properly".
> It is more appropriate for us fix our internal links, rather than tell
> search engines, "ignore our mangled links" !

I think that in this case fixing the references is the right solution.

--
Trygve



More information about the maintainers mailing list