In 2006, Neflix launched the Netflix Prize, “a machine learning and data mining competition for movie rating prediction.” Netflix hoped the $1 million prize would encourage a range of algorithmic solutions to improve the company’s existing recommendation program, Cinematch, by 10%. Cinematch used “straightforward statistical linear models with a lot of data conditioning,” and served two major purposes: as a competitive differentiator by recommending unfamiliar movies to customers, and more importantly, enabled Netflix to reduce demand for blockbuster new releases and improve assets turns of all DVDs in inventory.
Project Scope: Netflix defined the project scope as a 10% reduction of the “root mean squared error” (RMSE) from Cinematch’s existing 0.9525, and set a series of guidelines for competition participants. Netflix provided a dataset containing 100 million anonymous movie ratings for participants to test their algorithms.
2007 Progress Prize: In 2007, the BellKor Team, comprised of three employees from the Statistics Research group in AT&T labs, achieved an 8.43% improvement over Cinematch, and were awarded the first of two $50,000 Progress Prizes. The team spent nearly 2,000 developing a final solution that contained 107 algorithms and achieved a RMSE of 0.8712. Netflix engineers investigated the source code (a requirement for the prize), and identified the two best performing algorithms (of the 107): Matrix Foundation (also known as Singular Value Decomposition (SVD)) and Restricted Boltzmann Machines (RBM). “A linear blend” of the two algorithms were ultimately put to use in Netflix’s recommendation system, but the company set a goal of 1% improvement over BellKor’s solution to receive the 2008 Progress Prize.
2008 Progress Prize: The BellKor in BigChaos (the original BellKor team combined forces with colleagues from colleagues from Commendo Research) team won the second Progress Prize with an RMSE of 0.8627 and a 9.44% improvement over Cinematch.
2009 Grand Prize: With 24 minutes remaining before the close of the 3-year contest, Bellkor’s Pragmatic Chaos, a further expansion of the original BellKor team, submitted the ultimate $1 million grand prize solution. The final set of algorithms achieved an RMSE of 0.8567 and a 10.06% improvement over Cinematch.
The entire three year contest included 51,051 contestants and 41,305 teams (representing 186 countries). Netflix ultimately received 44,014 valid submission from 5,169 teams.
At each stage of the contest the original BellKor team expanded by adding new members, or in the case of the final submission, an entire competing team, to further refine their algorithms and drive for a >10% improvement. Second place finisher “The Ensemble” also formed as a collection of teams that had submitted individual solutions earlier in the contest. Wired
magazine summarized this crowdsourcing within a crowdsourcing competition: “The secret sauce for both BellKor’s Pragmatic Chaos and The Ensemble was collaboration between diverse ideas, and not in some touchy-feely, unquantifiable, ‘when people work together things are better’ sort of way. The top two teams beat the challenge by combining teams and their algorithms into more complex algorithms incorporating everybody’s work. The more people joined, the more the resulting team’s score would increase.”
“At first, a whole lot of teams got in — and they got 6-percent improvement, 7-percent improvement, 8-percent improvement, and then it started slowing down, and we got into year two. There was this long period where they were barely making progress, and we were thinking, ‘maybe this will never be won…’ Then there was a great insight among some of the teams — that if they combined their approaches, they actually got better. It was fairly unintuitive to many people [because you generally take the smartest two people and say ‘come up with a solution’]… when you get this combining of these algorithms in certain ways, it started out this ‘second frenzy.’ In combination, the teams could get better and better and better,” explained Netflix chief product officer Neil Hunt.
When it was an independent team, Pragmatic Theory, discovered that the number of movies rated by an individual on an given day could be used as an indicator of how much time had passed since the viewer watched the movie. They also tracked “how memory affected particular movie ratings.” (ed. note: unclear how this was done). Although this discovery was not particular successful on its own at achieving the >10% improvement, when combined with BellKor’s algorithms, it gave the new team a slight edge over the competition.
The Netflix Prize demonstrates the power of crowdsourcing in developing innovative solutions for complex problems. Further, it’s an interesting example of how setting various stages in the competition can help further push teams to achieve new success by combining their solutions with other contestants.