Earlier this month, Google acquired Kaggle, a platform that hosts competitions to crowdsource machine learning solutions to company-submitted problems. Companies post a problem and an accompanying dataset to the Kaggle community of some 140,000+ members. The winner receives a prize, historically ranging from $3,000 to $3,000,000. For instance, Deloitte recently posted a $70,000 prize for the algorithm that best predicted which customers would leave an insurance company in the next 12 months.
The Kaggle community consists of both data science experts and less experienced users. In fact, it seems that anyone can create an account: I was able to sign up with my personal email and even had the option of entering a competition. That said, users are subjected to an incredibly raw level of transparency that self-police submission quality. Competition entrants are ranked on a public leaderboard with statistics on how their rank has changed over the last week, total entries to date, and score. Furthermore, companies can limit competitors to the top tier of Kagglers through a service called Kaggle Connect.
The community comes to Kaggle for more than just prize money, however. The platform includes a substantial learning component, Kernels, where users can download the latest libraries, learn cutting-edge techniques, engage in discussion, and give and receive feedback on others’ work.
Finally, Kaggle offers companies recruiting exposure via its jobs board.
Value Creation and Capture
Like many crowd-sourcing efforts, Kaggle creates value by allowing companies to affordably tap a long-tail of data science talent. But the key value creation element here is the transparency it creates in data science outcomes, which drives both company postings and community engagement from actual experts. The public manner in which leaderboards publish the efficacy of solutions and surface talent encourages user competition and engagement. This has created a certain brand equity around the Kaggle ranking. Not only do users take pride in their rank, but employers have begun asking for it in job posting responses. This transparency also promotes innovation, where users can easily identify winners and learn and continually hone the latest techniques.
Kaggle captures value through fees from its competition and job postings.
What went wrong
In 2015, Kaggle faced earnings pressure, laying off 1/3 of its workforce and shutting down a pivot attempt that provided consulting services to the energy industry. While the team blamed market cyclicality, additional elements were likely at play here – for both the consulting service and Kaggle overall:
- Kaggle competition solutions didn’t drive adequate ROI.
- Lack of a recurring need for Kaggle-sourced solutions – possibly evidence that internal talent is getting better and/or new data science challenges aren’t popping up as frequently as Kaggle needs to sustain its business. 
- Deep vertical or industry-level knowledge is needed to solve business problems adequately – something Kagglers may not possess. Palantir ran into a similar problem with this as well.
- In-house data science competition puts Kaggle’s business at odds with client success.
- Kaggle’s community can still be tapped by other company recruiting channels.
At the risk adding another minimally valuable post-mortem critique to the many out there, I would have considered the alternative of productizing a data science toolkit that data scientists could use to solve common analytical pain points learned on Kernels, such as a model or data quality health-grader, a data cleaner for frequently used datasets, or a best practices checklist. Kaggle has amassed tremendous knowledge from its community of co-innovators, and this transparency might have had substantial, recurring commercial potential.