So with SES San Jose just around the corner, Dan Thies put up a great post detailling all the headaches that a website owner could get when looking at his serps a bit closer or with the right tools to do so …
Dan calls this “Google Proxy Hacking”, but frankly, we are not hacking any of Google’s proxies – so I’m talking about Google Bowling via Proxy Sites – related to the older black hat term “Google Bowling” for buying too many / bad links for competitor sites to knock them off the serps. Yes, it IS possible to knock a competitor site off the SERPs, altought Google says= there is nothing almost nothing a competitor can do to harm you (yeah, right – the Google folks weakened this message some months ago, because the “nothing” was plain wrong – and they knew it).
If you read thru Dan’s post you might get headaches just like this guy from all those details and the partly wrong promises for a cure for it with two solutions that BOTH address only the outdated part of the problem.
So I though I have to illustrate to you what’s going on and how Google Bowling via Proxies actually looks like

The above results are returned if you search for the unique phrase
related details is the CEMPER.COM expertise that you can order
which is was only found on my company site cemper.com … (ok – now it’s also found on this marketingfan.com
blog and on marketingfan.at as soon as we translate it)
As you can see this unique phrase which should id if my page is healthy does not show my own site but “one of those PITA sites”: run by a guy called Matt Twine from the UK (if that IS his real name…)
and as you can image the url http://www.proxydust.com/index.php?q=aHR0cDovL3d3dy5jZW1wZXIuY29t has an EXACT copy of my company site’s home page there…
Did I hear Spam Report? yadda yadda – don’t bother – the Googlers don’t seem to care, because I submitted that 2 weeks ago…
Now clicking that “filter=0” to reveal all search results we see this HUGE list of pages – cemper.com coming second…. as a filtered result right after that proxy site used for google bowling…

[... pages cut out here … ]

But also we have a couple more scumbags stealing my content and trying to hijack my site…
In fact only the ProxyDust copy wins big time over CEMPER.COM because … believe it or not…
that fricking domain registered in January 2007 got a Wikipedia backlink
And my site does not.
I currently think that’s the main reason why Google chose them over my own site – which is from 2000, not heavily SEOed, but I bet a handful more trusted than this Mark “Thief” Twine’s site.
Well, it might well be that Mark has NO CLUE about what he does, but all those ads plastered around my site indicate different.
In fact it appears the whole strategy of running those proxy sites is to earn money from the ads placed on other’s content and cashing in on their work…
Frankly, I love Dan’s general post as an introduction to this post, because I would have hated to explain it in all length as he did. But what he points out as “solutions” are somewhat old school methods to identify bots that pretend to be Google, Yahoo or MSNbot….
Dan’s post ALSO contains the 2nd method for sending ALL visitors a “noindex, nofollow” that do NOT
1) Identify as spiders
2) Pass a “valid IP address” test
Pretty cool – I think that might work – and will test this ASAP, in addition to my own method of blocking those scumbags.
Further readings:
I discussed this with IncrediBill last week who has a great post up on identifying fake bots – but his comment is also just
PROXYDUST appears to just pass thru the user agent as-is, hard to say without seeing an actual hijacking if they do something special with Googlebot.
Anyway, they operate out of uk2net and the easiest way to make sure you’ve got all their IPs is to just block the entire data center.
inetnum: 83.170.96.0 – 83.170.111.255
netname: UK2-NET
route: 83.170.96.0/20
and then
Automating it is sometimes proxy and behavior specific, nothing I could tell you how to do in a quick post.
Some of them actually slip through the cracks for a while until they reveal themselves so it’s not 100% bulletproof.
The only way to get most of them is to simply block all hosting centers.
I actually blocked a TON of IP ranges,including those of a rogue bot called Twiceler in the last 2 weeks…
but the “noindex” hack mentioned above is the next countermeasure…
I REALLY hope I can generalize this to protect ALL my sites without having to change all of them…
And then we got some more cool posts on
10 Ways to protect your site from negative SEO” where hamlet refers to “negative SEO” for all kinds of actions a competitor could take against you … frightening …. and Never Ending SERP Hijacking where he correctly states that the REAL problem are those sites like proxydust that DO NOT pretend to be Google….
Has YOUR site been hijacked? Do you know?
How you could know? Just follow Jim’s post to find if a page is in supplemental … but actually make sure you look at the results closely… because what you might find is that somebody is stealing your content….
You should do that for EVERY PAGE of your site – best case – if you got the right tools for it…. but it costs a lot of resources either way – by hand or by machine tool.
Let me know about YOUR hijack experiences !
(and I’m sure people should talk about this at the SES in San Jose , however I fear they won’t too much…)
Comments
A couple notes, Christoph -
A couple notes, Christoph - I hit one of my sites through Proxydust and they pass the UA through just fine. I'd check your scripts if you're inspecting by user agent and they're getting through.
Another thing to try, since Google is slow to recrawl and remove existing entries even if you return 403 Forbidden - when you know it's a proxy, return about 500 words of "lorem ipsum, sic dolor sit amet..." and that way Googlebot will index this as the new content of that proxy URL.
Lorem Ipsum... dammit -
Lorem Ipsum... dammit - that's way to go... it's obvious that Google cannot remove a page just because a server is down for some hours or even days (oh yeah - if you got the wrong hosting company that can happen
http://weblog.cemper.com/a/200701/19-50-hours-outage-admin-have-you-changed-something.php
I'll check into this and of course implement that cloaking script TODAY on my site at least as a countermeasure for those other 1000s of proxy sites out there
Christoph C. Cemper
- the http://www.marketingfan.com