Archive for May, 2010

How Facebook checks your account is not fake (and keeps legitimate users out)

Thursday, May 13th, 2010

Facebook has been boasting not only the most users for a social networking site, but also the most “quality” users (ie less fake users).

How can Facebook achieve that?

First they’ve been actively purging fake accounts.
More recently they started testing your knowledge of your own friends.
Because everybody knows you know well all your friends on Facebook right?

This feature is disguised as a way to check that your account is not being accessed by an unauthorized person logging from a unrecognized computer.
It happens if you try accessing your account while traveling, and logging from a location you never logged from before.

Not only you will be asked for a Captcha, but Facebook will present you a set of 7 photos where your friends have been tagged, and you will have to recognize them.

You can only skip 2 questions so you better know your friends well, and be lucky enough to be shown photos with an actual face on it.

Especially when they start asking you questions on tagged pictures like those ones:

Try it yourself by logging from a friend’s computer or from a computer abroad and test your knowledge of your friends.
It’s fun, especially when you get yourself locked out of your own account and have to wait one hour to try it again.

Detailed flow

First you’re being asked to enter a Captcha:

Facebook login verification using photo tagging

Then you are tested on your knowledge of your friends’ pictures:

Facebook login verification using photo tagging

It reads:

In order to proceed, Facebook needs to verify that you are the owner of this account. To do this, please identify the people tagged in the following series of photos.
To pass, you cannot get any answers wrong. If you aren’t sure about a question, please skip it. You can only skip two questions.

If you fail, you get this:

Facebook login verification using photo tagging

Please come back in a little while
Your answers were not accurate enough.
For security reasons, you are only allowed to authenticate your identity once every hour. Please come back then to try again. Sorry for the inconvenience.

If you succeed after having failed previously, you will be shown the recent login attempts to review.

Facebook login verification using photo tagging

Please review recent activity on your Facebook account
Your account was recently accessed from a location we’re not familiar with. Please review the activity details below.
If anything looks unfamiliar, we’ll help you change your password (this will help prevent people in the future from accessing you account without permission).
Do you recognize the account activity listed above?

Note the funny wording about preventing people from the future to access your account. Ambiguous. What if I want my future self to access it?

Is it too much?

Although these extensive security measures really do their job of keeping unauthorized persons to access your Facebook account, aren’t they a bit too much?

Wouldn’t a more classical method combining a Captcha and an email with a link to confirm you identity be enough?
Here you have a Captcha, plus a series of 7 pictures with friends tagged.
You cannot make any mistake, you can only skip two questions.
It is overkill for user identification.
Hence the underlying reason behind this flow is more for fake user account determent than really protecting your account from unauthorized logins.

Maybe at some point the only way you will be able to add a friend on Facebook will be to go through 7 random pictures with and without your candidate friend on it and you would have to tell if he is in the picture or not.

This will certainly upset spammers using fake accounts using friends they know nothing about.
But it will also upset those real people having lots of friends because they are just over eager to add more.

We all have one of those friends, don’t we?
You know, those with more than 1,000 friends you always wondered how they know so many people (and they probably don’t).

I’m curious to know how well they would do at the photo tag test.

UPDATE Sept. 09 2010:
It seems we were right as Facebook filed to patent social Captcha. See the patent application.

Facebook rivaling Google by building its own Web Crawler powered by… You!

Saturday, May 1st, 2010

Update April 14th 2012: Sergey Brin warns of the threat Facebook poses to the Internet and open web by building its inaccessible content. But is Facebook a threat for the Internet itself? Or just for Google?

With Facebook giving the publishers easy ways to mirror their external pages on Facebook, it means it is effectively building the most relevant search engine, the semantic search engine.

And it’s doing that using free labor: viewers of pages who “Like” or “Recommend” a page using the new Like button.

Here is an example of this search engine (using this very article), already available and directly integrated in Facebook, listing all the web pages external to Facebook that users “Liked” in the “Page” section of the results:

On the technical side, you can see Facebook’s crawler in the access log of your HTTP server.
It has the User Agent set to facebookexternalhit/*
Here is an example of Facebook pinging this very article in the access.log of the Apache server: – - [05/May/2010:08:31:40 -0600] “GET /2010/05/facebook-rivaling-google-by-building-its-own-web-crawler-powered-by-you/ HTTP/1.1″ 200 18379 “-” “facebookexternalhit/1.0 (+”

Complex Expensive Mathematical Proprietary Algorithms vs. You

What will the result be?

A search engine more powerful than Google as it will index only real pages that actual users like, and not fake pages.

Google and Bing and all serious search engines in the market must spend millions of dollars building complex automated Web Crawlers that keep surfing around the clock, retrieving pages, following links and crunch each page to extract relevant information based on keywords to index them.

So far, the ranking of your site depends on how this complex algorithm works and classify keywords, with inner workings so obscure and complex it gave birth to a totally new field of Internet technologies: Search Engine Optimization (aka SEO).

Facebook is moving the power from complex obscure and proprietary algorithms to the end users, the only persons who can really tell if a page is worth reading.
Not only Facebook can tell the page is worth indexing, but it can tell what its rank should be, just by looking at the number of people who shared the same page.

Every time you click on a “Like” button, you effectively tell Facebook: here is the address of a page I read and liked, and it’s worth sharing with others.

Facebook is actually building the first successful crowdsourced search engine, and it will be a powerful one with pages chosen by users, getting rid of all the fake websites out there like parked domains and parasite websites that are just exploiting the biggest weakness of search algorithms: they are not human.

Those parasite websites usually do their own web crawling and build pages (sometimes on the fly) using keyword stuffing to lure search engines into thinking their content is relevant in order to achieve higher ranking, then serve lots of ads on the pages to generate revenue from that traffic.
I personally move right away from those websites when I land on them, meaning I will never click a “Like” button if they had one because it’s too obvious they are just copy pasting content from somewhere else.

You can’t lure the human eye so easily and a smaller percentage of people who lands of the parasite pages will actually “Like” them, while a regular search engine would rank them high based on the keywords.

Not to mention the porn sites.
Who will “Like” or “Recommend” a porn webpage with the link being posted directly to their Facebook profile and broadcasted to all their friends in their News Feed?

The raise of Microsoft’s Bing (under the cover of Facebook)

In a way, the “Like” button is how Facebook added a Captcha to all websites so only content worth indexing is being saved for search.
Even more powerful than a Captcha on a webpage, it’s also a Captcha on your brain and morality as users will not reference questionable websites like porn.

Who will benefit from that?
Facebook of course, but also Microsoft’s own search engine, Bing, which so far has been struggling against Google even after Microsoft spent billions of dollars on it.
Don’t forget that Microsoft invested $240M in Facebook back in 2007 (see Facebook’s press release), and they could well be behind Facebook’s strategy to take over the web.

Bing is already omnipresent in Facebook search and it keeps growing.

Facebook Bing Search

At more than a billion clicks per day on the “Like” button, it’s happening really fast.

The consequences are as follow:
- Facebook is referencing a “cleaner” web: it will have an inventory of real pages with less parasite websites referenced and more general audience content
- Bing from Microsoft will benefit directly from this crowdsourced search engine.
- the SEO importance will diminish

Of course people will adapt:
- SEO guys will get on the Facebook bandwagon and their job will be to add a “Like” button to your site (and still charge you a lot for that)
- Parasite websites will have to make their content much nicer to the human eye to fool humans into thinking they have the original content. It probably means that a simple copy of the original content instead of keyword stuffing will do better than a pages belching lots of content gathered from multiples places.
- Google will be (certainly is already) restless and start spending billions to compete with the Microsoft+Facebook alliance.

But don’t forget Google also has its own social network: Orkut.

Could Google’s response rely on finally getting Orkut to take off?
I would start by renaming it to something I don’t have to Google every time for spelling and pronunciation…

While Google will not go away any time soon, the search engine wars is just starting and the key is to make is social.