SpamRank – my interpretation

One of the papers I listed in The Top 15 SEO research papers is about SpamRank with the goal of Fully Automatic Link Spam Detection

Preparing for the SES tomorrow, I want to share my observations and interpretations of that WWW2005 paper

Goal

obviously search engines need a good and automatic way to identifying link networks and link farms – sites that are built soley for inflating page rank or a group of webmasters that join together for a link rink or a link exchange network

Method


key assumption is that
supporters of an honest page should not be overly dependent on one another, i.e. they should be spread
across sources of different quality

This means having too many high ranking sites is bad, while having only low ranking sites is bad too – emulate natural web


as in the case of the entire Web, the PageRank distribution of an
honest set of supporters should be power law.

You want a natural linking pattern for your sites

The two key observations in detecting link farms


Portions of the Web are self-similar; an honest set of supporter pages arise by independent actions of individuals and organizations that build a structure with properties similar to the entire Web. In particular, the PageRank of the supporters (ed: the linking sites) follows a power law distribution just as the case for the entire Web.

Again, you want a natural linking pattern for your sites


Link spammers have a limited budget; when boosting the PageRank of a target page, “unimportant” structures are not replicated.

If you need “unimportant” structures for your site, then go for low-value links like PR0, PR1, PR2, links from uncached pages, links from new sites, hell – any link – just make sure you build a GOOD, NATURAL (again) mix

So how can I lower my SpamRank ?

These are my interpretations & recommendations only:

  1. Get a good mix of backlinks, where the PR follows a power-principle, say a power of 3 have 3^5 = 243 PR0, 81 PR1, 27x PR2, 9 PR3, 3 PR4 and 1 PR5 … you get the idea?
  2. THEREFORE: Don’t bother buying links on High PR pages like PR9,PR8 that sure will stick out of your link profile immediately
  3. Look at your competitors and their backlink profiles and try to emulate that
  4. Get links from within content (presell pages), not only sidebar, navigational or footer links
  5. Add “useless” pages like About, Sitemap, Contact us etc to your sites
  6. Add “useless” links to your sites – some nofollow links, some PR0, some uncached

Hell – and don’t count on every single link… make your site evolve naturally, the days where you bought 3 PR7 and got a PR6 next month are over for YEARS… (I still get questions from people who believe those 2003ish myths that are burried in some old abandoned webmaster forums and minds)

Two other algorithms are cited and I’m sure they are in development or production already

B. Wu and B. D. Davison. Identifying link farm pages.

and with 22 scientific citations and only via payables from Springs this here

H. Zhang, A. Goel, R. Govindan, K. Mason, and B. V. Roy. Making eigenvector-based reputation systems robust to collusion.

More to read on the topic of link farms and link spam detection is listed here

Yahoo’s Link Farm detection patent

Using Rank Propagation and Probabilistic Counting for Link-Based Spam Detection

Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings

Sidenote: Barry and others already reported from last years’ Chicago show (I think it was Chicago) that Matt Cutts from Google was playing with
his notebook to point out rented links and links from link farms.

While it’s not clear that thes algos are in effect for all search queries, all websites, all countries or languages, it’s sure Google & Co have implemented MANY of them – as the pure calculation is pretty simple once you have the huge database of web site vectors (interlink data) as Google & Co have.

[Post to Twitter]  [Post to Delicious]  [Post to Reddit]  [Post to StumbleUpon] 

What say you?

Want a little more?



MarketingFan is a search marketing and SEO Blog that was started in 2004 and is maintained by Christoph C. Cemper and his team. If this post was interesting for you, then others might be as well!

Keep up to date with our RSS Feeds



rss

Follow Christoph on twitter.com/cemper



If you want to get notified in a timely fashion about updates of Marketingfan then signup for the Blog Newsletter



Get site updates via e-mail
Mail:
What:


Please enter your (first)name in the form above. Of course we respect your Privacy

Or contact us here ....