CAPTCHA’s are annoying little things that we all have to deal with. They are the Turing Test verification process we all go through to prove we are not robots online. The term CAPTCHA, or Completely Automated Public Turing test to tell Computers and Humans Apart, was coined by Luis von Ahn of Carnegie Mellon University. There were many different versions of CAPTCHAs utilized around the web to stop website from attacks from spam bots or intrusion bots. They proved effective in authenticating users as humans. However, despite its efficacy, von Ahn noticed that while it only took several seconds to type the letters, hundreds of thousands of combined human hours were being wasted each day. He wanted to find a way to harness that collective effort, so he created reCAPTCHA in 2007.
reCAPTCHA was a CAPTCHA software program, just like the others, only reCAPTCHA was free for websites to integrate, resulting in a proliferation of adoption, making reCAPTCHA the internet’s standard CAPTCHA program. The genius behind von Ahn’s business model was that every time a user verified themselves, they were actually creating indexable, digitized text of hundreds of years of books, magazines, journals and newspapers.
How does it create value?
reCAPTCHA works by presenting the user with two words, as opposed to one with similar programs. And unlike other CAPTCHA’s, it is showing you real words from archived text that Optical Character Recognition (OCR) software has been unable to identify. OCR software has been crucial in ‘reading’ scanned books, journals, newspapers etc. so that they can be digitized and indexed online. But OCR software has its limitations; it struggles with many texts that a human has no problem reading, such as old, smudged newspapers or old books where the ink has faded over time.
The first word that the software gives you is known, but the second word is not. If six users all input the same word, then it is a near certainty that it is the correct word, reCAPTCHA can then categorize that word, which the OCR program could not interpret.
How does it capture value?
Von Ahn first partnered with the New York Times to digitize all of its back issues. Within a few months, reCAPTCHA had digitized the previous 20 years of New York Times issues. Within the first year, 440 million words were deciphered; the equivalent of 17,600 books.
In 2009, reCAPTCHA was purchased by Google for an undisclosed amount. Google used it to build its Google Books library, which is now one of the largest digital libraries in the world, thanks to reCAPTCHA. Google has since used reCAPTCHA for other purposes, such as having users identify street names and addresses from Google Maps Street View. reCAPTCHA has proven to be in-keeping the adage so often associated with Google; if you are not paying for the product, you are the product.
reCAPTCHA does not have the same efficacy today when AI-powered bots have become so capable, but it created a huge amount of value for a long period of time. Cybersecurity and text digitization are two entirely disparate things, and you would be hard-pressed to find anything but a tedious association between the two. Yet von Ahn and his team were able to harness the collective effort of hundreds of millions of people – who websites needed to authenticate anyway – to perform an incredibly important task that automated technology was unable to do at the time. Yes, the New York Times paid for this service, but what they were able to digitize in a few months would have taken them much more time and money to do otherwise.
Why aren’t there more?
Why then, I ask, has this ‘hidden crowdsourcing’ concept not been replicated time and time again? It appears to be the perfect business model:
- Serve an online need (e.g. authentication)
- Give it away for free
- Harness the collective effort from serving that need to solve another need that has no effective solution
- Charge for that solution
Von Ahn founded another company – Duolingo – in 2011 to give users free language learning services, while translating the internet with more accuracy than most AI-enabled language translation services. But is he the only person capable of building a business around this unique crowdsourcing method?
Maybe, or maybe we are just not looking hard enough.