Scale: Labeling the Future of Data - Digital Innovation and Transformation

Log In

[wppb-login display="true" lostpassword_url="https://d3.harvard.edu/platform-digit/lostpassword" redirect_url="https://d3.harvard.edu/platform-digit/submission/scale-labeling-the-future-of-data/"]

Scale: Labeling the Future of Data

Founded in 2016, Scale, is a San Francisco-based startup specializing in helping its clients label and curate data for artificial intelligence applications led and founded by its MIT college dropout CEO and founder, Alexandr Wang. Moreover, Scale is increasingly moving from just labeling data to offering software-based services that helps its fast-expanding portfolio of customers gather, annotate, curate and clean-up their data as well as build and monitor machine learning models trained on that data[i]. The company has been valued at $7.3 billion after raising $325 million this week. The rising startup has an impressive list of investors including The latest investment round was led by Dragoneer, Tiger Global, Wellington Management, Durable Capital, Index Ventures, Founder’s Fund, and Y Combinator. Its visionary CEO thinks of data as the new code, and the building block for the next wave of AI based technology.

Pathways to a Just Digital Future

Watch this tech inequality series featuring scholars, practitioners, & activists

In a recent interview with Fortune, its CEO has shared that, “the company is currently on track to earn $100 million in revenue in a twelve-month period and that sales have doubled in the past year”. Both legacy companies and new hot and rapidly growing companies are signing up as customers of Scale. Some very relevant sample clients include PayPal, Pinterest, the U.S. Air Force, Toyota, General Motors, Lyft, Brex, SAP, NVIDIA, and Samsung[ii]. Both legacy companies While there have been other startups focused on both data preparation and services focused on training this data for A.I. systems, such as Samasource, Labelbox, Hive, Cloudfactory, and DataLoop, Scale is rapidly emerging as the most likely winner in this increasingly relevant market.

The main challenge that Scale is facing is that it is providing the basic and early tools needed to build a whole new industry by reinventing how our digital world is built. Its goal is to move machine learning into the mainstream by making it replace a specialized resource constraint (software engineers) with a commoditized one (data labelers). The people at Scale believe that by doing this they can significantly accelerate technological progress on a global scale since the growth rate of software will accelerate over the next decade as more software is created by machine learning through metaprogramming (developers writes program that writes programs) and they key bottleneck today for machine learning is the lack of labeled data[iii].

The main problem with this challenge, that Scale is a first mover into this uncharted industry, is in itself also a powerful opportunity given the nature of the market. If data labeling and overall curation for machine learning becomes a winner takes all market, Scale could be positioning itself to capture this market very early in the game and become one of the most relevant companies to emerge this decade. There are four factors that are extremely important in classifying this as a winner take all market and Scale’s first mover position as more of an opportunity than a challenge:

Network Effects: Data labeling has high direct network effects as the company that has more customers will have more data and thus will be better able to refine and train its data labeling and curation processes. Companies will move towards the data labeling and curation provider with the best processes, and this will most likely be the one with the most and better data processing experience, thus the one with most customers. This will create a virtuous cycle for the service provider that can grow fastest, thus benefiting early market entrants such as Scale.
Multihoming: Once a company has shared its data and integrated its key machine learning processes with a certain provider it might be difficult or unwarranted to switch over to another provider thus creating high multihoming costs to the provider’s clients, resulting in product stickiness that benefits early entrants like Scale if they do a good job in their early work.
Differentiation: Data labeling and curation processes and services can be highly commoditized when dealt with the appropriate talent, thus it has a low level of differentiation which significantly benefits early movers such as Scale.
Economies of Scale: Supply side economies of scale could create a defensive moat that self-reinforce the demand side economies of scale driven by network effects by attracting top talent that will improve considerably the infrastructure needed to provide best in class services on the edge of technological advances in machine learning.

Given this quick market analysis, and specially driven by the strong network effects and non-differentiated nature of Scale’s product, Scale may be poised to conquer this future defining market and ride this next decade to become a central piece of our day to day lives by labeling the data that will drive the world around us.

[i] Jeremy Kahn, Fortune, April 13, 2021, “Data-labeling company Scale AI valued at $7.3 billion with new funding”. URL: https://fortune.com/2021/04/13/scale-ai-valuation-new-funding-fundraising-data-labeling-company-startups-vc/#:~:text=Scale%20AI%2C%20a%20San%20Francisco,in%20the%20latest%20investment%20round.

[ii] Scale: Customers. URL: https://scale.com/customers

[iii] Scale: What we do. URL: https://scale.com/about

Student comments on Scale: Labeling the Future of Data

April 19, 2021 Omar Aboulezz says:

As I read this post, what kept coming to mind was how easy it would be to imitate this service. You referenced this in #3 above, differentiation, which I agree with. How much of a threat do you think it would be for companies to develop their own data labeling services in house. This seems like a fairly easy task that can be outsourced to low wage areas like we saw with Affectiva.

Data prep is a necessary part of the ML process and can be considered a part of the digital supply chain. Super cool to see a company try to make it its own niche. I think this service can help smaller companies, but will be under threat from larger well resourced companies.

Digital Innovation and Transformation

MBA Student Perspectives

Pathways to a Just Digital Future

Student comments on Scale: Labeling the Future of Data

Leave a comment Cancel reply