Founded in 2010, Kaggle is a crowdsourcing platform that helps companies find Machine Learning solution through hosting predictive modeling and analytics competitions. Competitions focus on solving predictive analytics problems, range from predicting cancer to rental price. Participants/Teams that build the best models, i.e. achieve the highest scores, are the winners of the competitions: win a prize ranging from few thousands to a million or get recruited.
Since then, it has attracted 600,000 data crunchers with diverse backgrounds, such as data scientists, computer scientists, and biologists, from 194 countries. The community is not only active in building models and making over 35,000 submissions per day but also exchanging ideas and sharing latest machine learning algorithms. For example, Geoff Hinton and George Dahl educated the community with deep neural networks, and Tianqi Chen implemented a famous algorithm with accurate predictive power and shared the implementation with the community. As the community grows, community members sharpen their analytics skills and build credential with Kaggle. While many Kagglers have joined companies from DeepMind to Walmart, Kaggle has recently been bought by Google.
How does it work?
When a company wants to leverage Machine Learning to tackle business problems, it can choose an Analytics Solution provider or turn to the crowd. Netflix has benefited from crowdsourcing Machine learning algorithms. It hosted a competition with grand prize of 1 million USD, attracted brilliant minds, such as researchers from AT&T Labs and Yahoo!, and improved its recommendation algorithm by 10%.
Kaggle provides an access to quality data scientists for companies that want to crowdsource solutions through hosting a Machine Learning competition. To help companies set up competitions, Kaggle offers consultancies, such as defines a valuable problem given the data the company provides, and prepares data for the competition. For example, to evaluate the performance of teams, Kaggle needs to set aside some data as test dataset and define metrics to score the accuracy of predictions submitted by participants.
After a competition is launched, Kaggle will monitor the competition and provides tools to help participants experiment with various algorithms to compete. Participants build models locally or online using Kernel, submit their predictions of test set, and receive scores for their submissions. Leaderboards post the score of all the teams, and hence incentivize teams to continue to improve their models. Participants can choose to share their scripts, i.e. codes for analyzing data, publicly and discuss their problems in forums.
Once the competition is closed, the competition host will award the winner with prize. Kaggle will assist the company in obtaining IP license and integrating winning model to business. If the competition is for recruiting purposes, the company can screen participants based on their performance, i.e. score and place on the leaderboard, and their scripts.
Kaggle creates value by crowdsourcing brilliants mind to help business solve problems with Predictive Analytics, while fostering the growth of Machine Learning community. Crowdsourcing can create substantial value in this setting because there is no unique solution/algorithm to a predictive modeling task. More participants makes it more likely to find the optimal model. In addition, given a set of metrics, it is very easy to evaluate models and screen for the best.
In addition, Kaggle also helps connect companies with talented data scientists. Since data science is a very new field, many data scientists are self-taught. They can build up their profiles on Kaggle and attract recruiters.
Kaggle captures value by charging platform licensing fee, problem setup consulting fee, and any additional service fee, such as custom evaluation metric. In addition, for recruiting competition, Kaggle will charge a recruiting fee per hire.
In addition to providing full service on setting up competitions, Kaggle has also developed tools, such as Kernel, for participants and is dedicated to build and maintain a healthy community. Kernel is a product where Data Scientists can build models and test them online easily. It also allows them to share their codes publicly and ask for feedback from the community. It promotes the openness of the platform further and encourage idea exchange and learning from each other. Thus, participants are more likely to stay with the community because they can always learn new things and strengthen their analytical skills. Furthermore, when choosing projects, Kaggle makes sure that the challenge is both interesting and approachable, which also encourages more data scientists to participate and contributes to the stickiness of the community.
Since Kaggle has joined Google, it now has the access to Google Cloud, which allows it to hold competitions that require huge computation resources. The access to Google Cloud broadens the variety of analytics tasks that can be crowdsourced. Kaggle community will then play an even bigger role in the advancement of AI.