Working without even knowing it: ReCaptcha and Duolingo leverage unwitting crowds.

Founded by a Guatemalan crowdsourcing pioneer, ReCaptcha and Duolingo harness tiny bits of mindless human work to scan, categorize, and translate the Internet.

We’ve all seen them. You’re trying to buy a concert ticket or log into a website, and they pop up stubbornly:


Called CAPTCHAs (an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart), the puzzles help websites verify that their visitors are real human beings. In so doing, CAPTCHAs serve an important role in preserving the value that web-based companies create, and have as a result become a valuable service.

The proliferation of the web’s countless services and marketplaces was paralleled by the proliferation of automated software that sought to steal sensitive information. “Bots” using algorithmic key cracking protocols began crawling websites, clicking on buttons, and entering information into forms to spoof rudimentary security measures. In turn, companies were forced to build security measures to block would-be robo-thieves, and word puzzles quickly emerged as a useful tool. While robots are very good at crunching numbers and performing repeated tasks, they are notoriously bad at recognizing distorted words. CAPTCHAs separate the wheat from the chaff. While many companies began developing CAPTCHAs internally, third-party providers also emerged to fill the space in the market.

ReCaptcha, a company conceived by several computer science/engineering professors at Carnegie Mellon University, emerged as an early leader in the space. Louis Von Ahn, a Guatemalan entrepreneur and one of ReCaptcha’s founders, believed that many of the worlds problems can be solved by harnessing the power of crowds. Putting this belief into practice, ReCaptcha charges companies for the verification service, but also puts users to work behind the scenes.

Although most people may be mildly annoyed to prove they aren’t robots, they are able to complete these small word-identification puzzles in a matter of seconds. What many people don’t know, however, is that during those mindless few seconds of work they are also helping to digitize the world’s books—one word at a time. ReCaptcha uses repeated comparisons between its users’ answers to gradually verify tricky words from all kinds of scanned texts. With more than 100 million ReCaptchas every day, the work adds up. Gradually, ReCaptcha expanded to include image-identification tasks, which help classify photos across the web (particularly useful for identifying harmful content). It also is building a large machine-learning database that is improving the performance of AI computers.

After selling ReCaptcha to Google in 2009, Von Ahn decided to take his crowd-work learnings and deploy them in a new space. The result was Duolingo, a gamified education app that helps people master new languages in small, manageable chunks. Users can practice their skills in thirteen languages through fun and challenging games without having to invest in long, expensive classes. In addition to being extremely effective, Duolingo is free for users. To make money, the Duolingo team realized that they once again had the opportunity to harness crowds to do important work.

As they progress through the language curriculum, users can elect to play an “immersion” game where they translate small passages of text—great real-world practice.  What they don’t know, however, is that they are once again translating the vast digital documents of the Internet, a valuable service that Duolingo monetizes by charging content owners per word. What remains to be seen is whether charging for translation will be sustainable (it is already quickly becoming commoditized). It may well be the case that the company makes the bulk of its money elsewhere (certification tests, for example), but the impact of its crowdwork strategy has already been immense.

Both ReCaptcha and Duolingo demonstrate the utility of hidden crowdsourcing—that is, using behaviors that people are already doing to get work done. The beauty of this method is that it doesn’t require an additional conscious effort on behalf of the individuals in the crowd in order to be effective. At the same time, its utility may be limited to language-based applications, where the value-added quickly decreases. The philosophy is certainly appealing, and it will be interesting to see if and how it can be applied in other industries.


5 thoughts on “Working without even knowing it: ReCaptcha and Duolingo leverage unwitting crowds.

  1. How cool! As you mention in your post, I usually am pretty annoyed about filling in the word-identification forms; I wonder how many users drop out of a purchase or sign-up puzzle at that stage (probably small, but still significant on aggregate) for that very reason. I also wonder whether ReCaptcha could both decrease annoyance and capture even greater value by somehow publicizing or informing users that they are helping to digitize the world’s books. A simple one-liner with hyperlink to a different site could lead users to a page where they can de-code even more text (for free) should they want to help.

  2. I love the business model of ReCaptcha and Duolingo! It’s such a genius way of using crowdsourcing to complete otherwise impossible tasks. It would be interesting to see how Von Ahn expands both companies beyond what’s expected; namely, in addition to including more languages and more content, are there other ways crowds can be used as “free labor” to create scalable impact.

  3. Wow! I had no idea they were using our “free labor” in that way. That’s really interesting! It’s somewhat more of a real-time need, but I wonder if something like that could be used for creating closed-captions for video/audio clips. Granted, it was a couple of years ago, but the last time I worked on something requiring closed captions, I remember the solutions were slow, expensive, inaccurate, or some combination. I wonder if this could be a cheaper and more effective way of creating those captions.

  4. Interesting post! After reading some of the comments to you post, I wonder whether users of both services would be more motivated or understanding if the crowdsourcing was not hidden? What is the benefit to the users for not disclosing this information whether to the average internet user or a language learner? I know I feel better knowing why I am being hassled to translate a phrase or to participate in translating an article for CNN/Buzzfeed. If anything, this strategy seems to miss a unique opportunity to better motivate the user and potentially increase the quality of the crowdsourced content. It also seems like Duolingo’s crowdsourcing translation platform is not proving to be the main revenue source for the business — what should the lesson be from that? Is it due to the misaligned incentives of its users or is the type of work that much more complicated than the ReCaptcha work?

  5. I also had no idea that the ReCaptchas were used for anything besides verification! Duolingo makes it a little bit more obvious that they’re making money in this way (specific mentions of buzzfeed and CNN article translations on the web version), but that actually is motivating to me as a language learner, since I realize that I am translating real content. I wonder like others if Duolingo in particular should make this even more obvious, particularly in the app, since it’s at once motivating for language learners and the only income (that I am aware of) for Duolingo?

    I also have to wonder besides the already mentioned closed captioning problems, what other things can be worked on in such a way using crowds. I’m curious to see what Von Ahn comes up with in the future, particularly now since ReCaptchas are starting to experience user backlash.

