Machine Learning Reduces Dropouts at the University of Arizona

The University of Arizona has leveraged machine learning to push student retention to all time highs, but some critics have concerns about data privacy.

Student retention is one of the most significant issues facing higher education institutions today – in the U.S. only 60% of students graduate within six years. [1] Machine learning is a valuable tool that universities can use to improve their student support and retention processes.  The value of machine learning for the University of Arizona (“UA”) lies in its predictive power, as it will empower them to extract new features that will consequently enable faster, more accurate identification of students who are at-risk of dropping out.

New Feature Extraction

Machine learning helps UA identify previously unmeasured variables that affect student retention.  Current approaches to data analysis focus only on highly measurable information such as students’ course grades or demographic information. Machine learning provides the opportunity to move beyond these. For example, UA is leveraging location and purchase data from student swipe cards that give insight into students’ non-academic lives (see below). Machine learning will help identify which of these factors correlate to student retention and improve UA’s ability to accurately identify at-risk students.

Faster Response

The second reason machine learning is critical for UA is that it allows them to intervene much sooner to support struggling students. According to UA researcher Sudha Ram, “freshmen who ultimately leave the university make the decision to do so in the first 12 weeks…long before final grades are posted.” [2] By expanding the range of factors to include data beyond grades, UA will be able to identify students earlier, increasing the likelihood that an intervention will be effective.

Current Strategies

UA is already leveraging machine learning to great effect, as the 2017-18 one-year retention rate of 86.5% was the highest in the history of the university and exceeded the targets set by the Arizona Board of Regents. [3] In the short term they are expanding the types of measurable data that they analyze. Their analysis now includes nearly 800 factors in addition to academic performance. [4] For example, Assistant Provost Angela Baldasare confirms that “in the financial aid world, we found some early indicators.” [5] This expansion into new features beyond academic performance makes their predictive models more accurate.

In the medium term, UA is exploring ways to incorporate some less observable data by collecting location and purchasing information from students’ ID cards. Researchers led by Dr. Ram use the transactional data to infer how well a student is socially integrated into campus life. For example, by analyzing how regularly students use campus facilities, such as the gym or library, researchers can determine if students have developed strong routines. By mapping card transactions that occur very near in time and at the same location, researchers can make inferences about students’ implicit friend groups and social networks. [6]  Both of these factors have shown to greatly improve the accuracy of the model’s predictions, as According to Dr. Ram, the current model is “able to do a prediction at the end of the first 12 weeks of the semester with 85 to 90 percent recall.” [7] These efforts are currently only research-based using de-identified data, but in the next two to ten years these techniques could be put into practice to the great benefit of students.

More Data vs. Privacy

The challenge with expanding the number of factors and types of data used in the predictive models is that at a certain point you risk infringing on students’ privacy. Several critics of Dr. Ram’s approach believe that she has already gone too far. Fortune writer David Meyer, for example, believes that these “analytics techniques may hold a lot of promise, but they also clash with privacy rights.” [8]

I strongly believe that UA should continue to expand the amount and types of data it uses to identify at-risk students.  The more data they are able to collect the better they will be at preventing students from dropping out. In order to do this however, they must develop a proactive and robust data privacy policy. Students must be informed that this data is being collected and must be given the opportunity to opt out if they so choose. While there could be a risk that a large number of students will choose not to participate, I believe that If UA is able to clearly articulate its value proposition to students then it should have few problems with students opting out.

Questions Remain

I am somewhat unsure how students would respond if told about Dr. Ram’s data collection. I believe that students today are much more lenient with their personal data than students were a generation ago and most would not choose to opt out. I am curious as to my classmates’ perspectives on this. Would you allow the university to track your data if it meant they would be more responsive in providing student support?


Word Count: 797




[1] Joseph B. Treaster, “Will You Graduate? Ask Big Data,” New York Times, February 2, 2017, [], accessed November 2018. 

[2] Alexis Blue, “Researcher Looks at ‘Digital Traces’ to Help Students,” University of Arizona News, March 7, 2018, [], accessed November 2018.

[3] “UA Retention at an All-Time High,” University of Arizona News, October 4, 2017, [], accessed November 2018.

[4] “Researcher Looks at ‘Digital Traces’ to Help Students,” University of Arizona News

[5] “UA Retention at All-Time High,” University of Arizona News

[6] Ram, S, Wang, Y, Currim, F & Currim, S 2015, Using big data for predicting freshmen retention. in 2015 International Conference on Information Systems: Exploring the Information Frontier, ICIS 2015. Association for Information Systems, 2015 International Conference on Information Systems: Exploring the Information Frontier, ICIS 2015, Fort Worth, United States, 12/13/15.

[7] “Researcher Looks at ‘Digital Traces’ to Help Students,” University of Arizona News

[8] David Meyer, “An American University Is Spying on Students to Predict Dropouts. Here’s What That Says About Big Data in the U.S.,” Fortune, March 13, 2018, [], accessed November 2018.







You’re in Good [AI] Hands: Machine Learning Implementation at Allstate


Using Machine Learning to Improve How Energy Gets to Market

5 thoughts on “Machine Learning Reduces Dropouts at the University of Arizona

  1. This is a really interesting application of machine learning in that it has a dual-purpose. It aims to helps the university and students as well. I don’t think younger people are as data-concerned as older people, so I think they might not receive too much push-back from this implementation. I think in this discovery phase collecting a large initial data-set makes sense as they are using anonymous data. I would hope with time that they are able to hone down the data points that they discover aren’t necessarily related to retention. Right now, the 800 factors that they collect data on seem unnecessarily high. I think it might assuage privacy critics if they are able to reduce the data points that are collected and only collect the information that is most related to retention. The other concern I have is what the school will do when they identify students who are at risk of dropping-out. Identifying at risk students earlier is important but overall, I think they need more robust strategies during the orientation phase in order for students to avoid becoming at-risk.

  2. When I was reading this article, I thought of my friends who were struggling with college life. The struggling students tend to be quiet and not actively seeking help from school, even if they get below average scores in mid-terms and finals. In most universities, the current model of student support is that they will analyze students’ test scores and class participation after mid-terms or finals, but even after mid-terms may be too late in most cases. Therefore, I think it is a great idea for the University of Arizona to utilize all sorts of data to identify the struggling students early in the semester and actively offer help to improve student retention.

    I personally do not think the millennial college students would be too concerned about schools using their personal data to provide student support. I am even in favor of the personalized commercial ads as a result of machine learning and precision marketing, because I think these ads are tailored to me and thus save my time. When it comes to school support, I cannot see why I would refuse such personalized services.

  3. Results count and, personally, it’s hard not to admire the record high 86.5% retention at UA last year. Even more, UA is a large public research university and, with a student body north of 35,000 undergrads surely faces resource constraints to both identify and assist students in need, along any time horizon – let alone the first 12 weeks. Furthermore, even if many students recognize their struggles, they may not appreciate the resources available, know how to seek them out or simply have the self-confidence to reach out on their own. In short, the need and demand seems significant, particularly for students in the absence of traditional / legacy support systems like family, friends and communities.

    I sympathize with the critics of the system and remain highly skeptical of any unintended consequences of the data collection, particularly as it relates to the precedent it sets for both i) future data collection and ii) use-cases for data collection. I would also note that the system will only improve its efficacy as more data is collected and variables analyzed, which is clearly a slippery slope with regard to both the collection and application(s). Thus, UA must be careful and, while I believe that the collection of data on the nearly 800 factors can be extremely beneficial to students and the university itself, I believe strong barriers need to be put in place:
    1) As an academic institution, UA must be explicit in each and every use case of the data. Personally, they should only use the data to see people “hit the screen,” much like a diabetic typically uses a glucose monitor: only when applicable.
    2) Students must have the ability to opt out
    3) Students must be informed what data is being tracked each quarter (email dissemination)
    4) There is a Chinese wall established, so that few, if any, individuals have access to see both the data and individual’s identities at the same time. All anyone should be able to see is if someone “hits the screen” – with few exceptions

    As an academic institution, focused on helping individuals learn, grow and development, this particular application of machine learning seems beneficial to all parties involved, but it must be pursued carefully.

  4. I think this is an outstanding application of machine learning, and I do not believe that students would be particularly concerned about data privacy, as younger individuals tend to be much more comfortable and trusting of technology. My larger concern with this application is how universities will decide to employ a retention solution on their own campus. Building an algorithm in house is incredibly costly, requiring both data infrastructure and programmers, while looking for a solution externally is risky as well. There are many companies in this space from start-ups like Ellucian and Hobsons to larger entrenched players like Blackboard and Moodle who are expanding from the LMS space, but it is incredibly hard for universities to determine whose algorithm is best given the black box nature of machine learning algorithm. While any proven solution is likely better than nothing at all, it would be a shame for a university to be dissuaded from making an investment because they do not want to turn their students’ data over to an external provider they don’t fully understand.

  5. I really enjoyed reading this and learning about how machine learning appears to be a proven approach in improving retention rates at universities. I agree that data privacy will be a contentious point for this to be adopted at other universities. But I also wonder how and at what cost UA is using machine learning. Whether it is all in-house vs. a third-party vendor. And if it’s in-house, how much they are investing in this initiative and if this might be a good software business opportunity to create solution for the market if one doesn’t already exist.

Leave a comment