Update April 14th 2012: Sergey Brin warns of the threat Facebook poses to the Internet and open web by building its inaccessible content. But is Facebook a threat for the Internet itself? Or just for Google?
And it’s doing that using free labor: viewers of pages who “Like” or “Recommend” a page using the new Like button.
Here is an example of this search engine (using this very article), already available and directly integrated in Facebook, listing all the web pages external to Facebook that users “Liked” in the “Page” section of the results:
On the technical side, you can see Facebook’s crawler in the access log of your HTTP server.
It has the User Agent set to facebookexternalhit/*
Here is an example of Facebook pinging this very article in the access.log of the Apache server:
22.214.171.124 – - [05/May/2010:08:31:40 -0600] “GET /2010/05/facebook-rivaling-google-by-building-its-own-web-crawler-powered-by-you/ HTTP/1.1″ 200 18379 “-” “facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)”
Complex Expensive Mathematical Proprietary Algorithms vs. You
What will the result be?
A search engine more powerful than Google as it will index only real pages that actual users like, and not fake pages.
Google and Bing and all serious search engines in the market must spend millions of dollars building complex automated Web Crawlers that keep surfing around the clock, retrieving pages, following links and crunch each page to extract relevant information based on keywords to index them.
So far, the ranking of your site depends on how this complex algorithm works and classify keywords, with inner workings so obscure and complex it gave birth to a totally new field of Internet technologies: Search Engine Optimization (aka SEO).
Facebook is moving the power from complex obscure and proprietary algorithms to the end users, the only persons who can really tell if a page is worth reading.
Not only Facebook can tell the page is worth indexing, but it can tell what its rank should be, just by looking at the number of people who shared the same page.
Every time you click on a “Like” button, you effectively tell Facebook: here is the address of a page I read and liked, and it’s worth sharing with others.
Facebook is actually building the first successful crowdsourced search engine, and it will be a powerful one with pages chosen by users, getting rid of all the fake websites out there like parked domains and parasite websites that are just exploiting the biggest weakness of search algorithms: they are not human.
Those parasite websites usually do their own web crawling and build pages (sometimes on the fly) using keyword stuffing to lure search engines into thinking their content is relevant in order to achieve higher ranking, then serve lots of ads on the pages to generate revenue from that traffic.
I personally move right away from those websites when I land on them, meaning I will never click a “Like” button if they had one because it’s too obvious they are just copy pasting content from somewhere else.
You can’t lure the human eye so easily and a smaller percentage of people who lands of the parasite pages will actually “Like” them, while a regular search engine would rank them high based on the keywords.
Not to mention the porn sites.
Who will “Like” or “Recommend” a porn webpage with the link being posted directly to their Facebook profile and broadcasted to all their friends in their News Feed?
The raise of Microsoft’s Bing (under the cover of Facebook)
In a way, the “Like” button is how Facebook added a Captcha to all websites so only content worth indexing is being saved for search.
Even more powerful than a Captcha on a webpage, it’s also a Captcha on your brain and morality as users will not reference questionable websites like porn.
Who will benefit from that?
Facebook of course, but also Microsoft’s own search engine, Bing, which so far has been struggling against Google even after Microsoft spent billions of dollars on it.
Don’t forget that Microsoft invested $240M in Facebook back in 2007 (see Facebook’s press release), and they could well be behind Facebook’s strategy to take over the web.
Bing is already omnipresent in Facebook search and it keeps growing.
At more than a billion clicks per day on the “Like” button, it’s happening really fast.
The consequences are as follow:
- Facebook is referencing a “cleaner” web: it will have an inventory of real pages with less parasite websites referenced and more general audience content
- Bing from Microsoft will benefit directly from this crowdsourced search engine.
- the SEO importance will diminish
Of course people will adapt:
- SEO guys will get on the Facebook bandwagon and their job will be to add a “Like” button to your site (and still charge you a lot for that)
- Parasite websites will have to make their content much nicer to the human eye to fool humans into thinking they have the original content. It probably means that a simple copy of the original content instead of keyword stuffing will do better than a pages belching lots of content gathered from multiples places.
- Google will be (certainly is already) restless and start spending billions to compete with the Microsoft+Facebook alliance.
But don’t forget Google also has its own social network: Orkut.
Could Google’s response rely on finally getting Orkut to take off?
I would start by renaming it to something I don’t have to Google every time for spelling and pronunciation…
While Google will not go away any time soon, the search engine wars is just starting and the key is to make is social.