SpamRank – my interpretation
August 6th, 2006 — | What say you?One of the papers I listed in The Top 15 SEO research papers is about SpamRank with the goal of Fully Automatic Link Spam Detection
Preparing for the SES tomorrow, I want to share my observations and interpretations of that WWW2005 paper
Goal
obviously search engines need a good and automatic way to identifying link networks and link farms – sites that are built soley for inflating page rank or a group of webmasters that join together for a link rink or a link exchange network
Method
key assumption is that
supporters of an honest page should not be overly dependent on one another, i.e. they should be spread
across sources of different quality
This means having too many high ranking sites is bad, while having only low ranking sites is bad too – emulate natural web
as in the case of the entire Web, the PageRank distribution of an
honest set of supporters should be power law.
You want a natural linking pattern for your sites
The two key observations in detecting link farms
Portions of the Web are self-similar; an honest set of supporter pages arise by independent actions of individuals and organizations that build a structure with properties similar to the entire Web. In particular, the PageRank of the supporters (ed: the linking sites) follows a power law distribution just as the case for the entire Web.
Again, you want a natural linking pattern for your sites
Link spammers have a limited budget; when boosting the PageRank of a target page, “unimportant” structures are not replicated.
If you need “unimportant” structures for your site, then go for low-value links like PR0, PR1, PR2, links from uncached pages, links from new sites, hell – any link – just make sure you build a GOOD, NATURAL (again) mix
So how can I lower my SpamRank ?
These are my interpretations & recommendations only:
- Get a good mix of backlinks, where the PR follows a power-principle, say a power of 3 have 3^5 = 243 PR0, 81 PR1, 27x PR2, 9 PR3, 3 PR4 and 1 PR5 … you get the idea?
- THEREFORE: Don’t bother buying links on High PR pages like PR9,PR8 that sure will stick out of your link profile immediately
- Look at your competitors and their backlink profiles and try to emulate that
- Get links from within content (presell pages), not only sidebar, navigational or footer links
- Add “useless” pages like About, Sitemap, Contact us etc to your sites
- Add “useless” links to your sites – some nofollow links, some PR0, some uncached
Hell – and don’t count on every single link… make your site evolve naturally, the days where you bought 3 PR7 and got a PR6 next month are over for YEARS… (I still get questions from people who believe those 2003ish myths that are burried in some old abandoned webmaster forums and minds)
Two other algorithms are cited and I’m sure they are in development or production already
B. Wu and B. D. Davison. Identifying link farm pages.
and with 22 scientific citations and only via payables from Springs this here
More to read on the topic of link farms and link spam detection is listed here
Yahoo’s Link Farm detection patent
Using Rank Propagation and Probabilistic Counting for Link-Based Spam Detection
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings
Sidenote: Barry and others already reported from last years’ Chicago show (I think it was Chicago) that Matt Cutts from Google was playing with
his notebook to point out rented links and links from link farms.
While it’s not clear that thes algos are in effect for all search queries, all websites, all countries or languages, it’s sure Google & Co have implemented MANY of them – as the pure calculation is pretty simple once you have the huge database of web site vectors (interlink data) as Google & Co have.


