On a recent trip to Berkeley, I searched Yelp for “$$” restaurants and ran into a common problem – almost all of the top results seemed to have 4-star ratings (with hundreds or thousands of reviews). After 20 minutes of sifting through long-winded, often contradictory opinions, I felt frustrated and chose a place randomly.
Yelp and sites like Amazon are commonly used to discover the best choice among dozens of similar options. Over time, these review systems have accumulated hundreds of millions of crowdsourced data points and should theoretically enable better and more efficient decision making among users.
Then why does it seem like every decent place on Yelp has a 4 or 4.5 star rating? To test whether or not this was true, I scraped data from Yelp for the first 1000 hits for restaurants in Berkeley. For every restaurant, I recorded the price range, average rating, and number of reviews.
After throwing out restaurants with less than 20 reviews, 86.4% of all restaurants have between a 3.5 and 4.5 star rating. 74.4% of all restaurants have either a 3.5 or 4.0 star rating.
So, despite having such a rich data set, Yelp does not succeed at placing restaurants on a true 5 star spectrum of quality (equivalent to a 10-point scale since there are half stars). Rather, it tends to place them in 2 or 3 buckets of “decent”, “pretty good”, and “very good”.
Oftentimes, you’ll need to make a choice between 5 or 6 “very good” options and Yelp will leave you with written reviews to sort through. How does Yelp deal with this challenge? With hundreds of reviews available for popular locations, the order in which reviews are presented ends up being critical for how a restaurant is perceived. Not surprisingly, there have been many allegations against Yelp of selective ordering by business owners who decided not to pay up, and this has resulted in over 2000 complaints to the FTC through last year.
Given the inefficiency of distinguishing between options on Yelp, it’s not surprising that a new generation of curated discovery sites has begun springing up. Product Hunt and Gogobot are up-and-coming examples. Other services such as the HBS startup Thrsday offered to take the headache out of planning an evening by curating the entire thing for you and a date.
Is hand-curated content the final form of discovery, though? I don’t believe so. Yelp has a treasure trove of data stacked up, but they just aren’t using it in a way that makes discovery, and ultimately comparison between similar options, as easy as possible.
Instead of an average star rating, which tends to unhelpfully cluster results, users should be able to quickly see how others with similar likes specifically ranked a subset of choices. Such a dataset could be generated by presenting reviewers with a series of binary choices between establishments within specific subcategories.
Well established algorithms could be used to produce stack rankings of establishments based upon a large number of binary choices by reviewers. Rather than being static, these rankings could be tailored to each specific user’s tastes based upon their selections of binary options as well. The end result could be a personalized rating similar to what Netflix uses based on a viewer’s preferences compared to the broader crowdsourced dataset.
Yelp’s competitive advantage is entirely based on its proprietary mountain of user-generated reviews, but as the sheer amount of data to sift through grows larger over time, Yelp’s user interface will become less and less useful in it’s current form. Users will also become less patient and demand the most relevant information to be presented quickly and efficiently. Since modifying the discovery process may damage Yelp’s existing business model, they are unlikely to make the changes necessary to fully optimize the user experience. This leaves the door wide open for new competitors who can somehow bridge the data gap.