In recent years, machine learning and artificial intelligence are two terms that have gained mass popularity, particularly due to the promise they hold for numerous industries and sectors. When considering organizations that collect substantial amounts of data from their users, machine learning provides a very unique opportunity to not only analyze this data, but also to recommend actions based on the data, with minimum or no human input, making it an extremely attractive technology. AlmaPact, a HBS-founded student financing software startup, is a specific example of how machine learning can be applied to data to make recommendations better than a human expert ever could.
The AlmaPact platform connects students to investors, and creates income share agreements between the two parties, in which the student receives an upfront lump sum amount from the investor, and in return pays the investor a specific percentage of their income for a predetermined number of years after they finish school. The most difficult aspect of this value proposition for both parties is the current inability to accurately and reliably predict future earnings of students in whom investors are investing. From the student perspective, the higher their predicted income, the better their income share terms will be. They will be asked to pay a lower percentage of their income over a shorter time period. However, if the income prediction algorithm fails, or does not accurately predict the student’s post-graduation income, investors are penalized, receiving a much smaller payback amount than was predicted when they invested in the student.
To solve this problem, machine learning presents an enormous opportunity. As the population of students and investors who are executing income share agreements grows, and by looking at a variety of data points that are both inputs to the income prediction model and real-world outcomes of student job placement, a machine learning model is able to rapidly develop an understanding of what characteristics are very strong predictors of future income (e.g. education program, educational institution, pre-matriculation career, etc.), and what characteristics are very weak, or negative predictors of future income (e.g. parents’ income, hometown, etc.). The more data that the platform ingests, the more accurately the model is able to predict income, which is a self-reinforcing value creation loop that is a hallmark of machine learning. There is an additional network effect that makes this self-reinforcing loop even more attractive; as the model continues to refine itself and is able to predict income more accurately, the returns of the ISA become more predictable and less volatile, which attracts more investors to the platform. More investors results in more students who are able to get ISAs. This two-sided network effect is only possible with a robust machine learning algorithm.
In order to address the promise of machine learning, we (the founding team) are trying to get as much data on the platform as we can immediately, focusing on a few key data sources, so we can build a rough prototype of the model. In the near term, we need to make sure that we can continue to feed data to the model so that it is able to train itself and improve its ability to predict income. In the long term, this will result in an enormous barrier to entry for competitors, as our model will be based on years of data and will subsequently be very difficult to replicate without years of investment. This barrier to entry is one of the most promising aspects of the business, which is why we are putting much of our focus right now on how we can start acquiring and analyzing data. However, this is not a sure bet, as we don’t yet know that there will in actuality be a combination of data points that allow our model to closely predict income. There will almost certainly be some amount of volatility in our predictions, and a core assumption of ours is that the market will tolerate this volatility.
In the context of AlmaPact, I would love to hear from my classmates what data sources they intuitively think may be strong predictors of income, or what types of data we should start with to start training our machine learning model. I would also like to hear if my classmates think there are any other aspects of the business that could be considered strong candidates for a machine learning application, in addition to the income prediction application that this essay discussed.
Using decision tree classifier to predict income levels. July 2017. https://mpra.ub.uni-muenchen.de/83406/1/MPRA_paper_83406.pdf
US Adult Income: Salary Prediction November 2017.
Unfold Income Myth: Revolution in Income Models with Advanced Machine Learning Techniques for Better Accuracy. https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1652-2018.pdf
Machine learning income prediction using census data. January 2017. http://rstudio-pubs-static.s3.amazonaws.com/242194_e940e8ca58624852ab4d1b03c7df5c41.html
Analysis and prediction of adult incomes in U.S. January 2017. https://www.slideshare.net/AbrahamCherian2/paper-on-machine-learning-linkedin