Duolingo: From Hello to Hallo through Machine Learning

Duolingo is using machine learning to help users acquire new lanaguages and verify their current language skills – are broader educational applications for Duolingo's machine learning next?

Duolingo, one of the leading free language learning services, utilizes machine learning across thousands of exercises to help users learn and retain new languages.

One of the most important aspects of Duolingo’s product influenced by machine learning is measuring how easy it is to retain each word. Simply put, some words are easier or harder to remember, and Duolingo uses machine learning to more precisely identify this trend. Duolingo looks across every exercise ever completed and determines how easy or difficult a word is to remember based on how well students fared next time they saw that word. With enough data, it predicts how quickly students forget that word, and then predicts when it should test you on this word again. If you get the word right or wrong the next time you see it, Duolingo recalibrates the formula based on how other students like you learned the word over time, and tweaks how quickly you’ll be tested on the word again.1

This approach is important because everyone learns languages differently. Duolingo uses many methods, from matching exercises to rote translation, to help students learn. By studying how well students retain words based on different methods of learning, it can also optimize the algorithms to teach certain words using certain activities (or a clever combination of activities). Even the order in which concepts are taught can be optimized – should one learn adverbs or past tense verbs first?2

Duolingo has also developed an English proficiency test that measures a test taker’s English abilities. To build their test, Duolingo analyzed the Common European Frame of Reference, the global standard for measuring language proficiency. It processed tens of thousands of passages corresponding to different language levels and used machine learning techniques to develop algorithms about which words or concepts were most important to test at each level.3

Looking forward, Duolingo is focused on driving more users to the platform. Machine learning techniques work best when they have a robust data set to work from. While some languages pairs, like English to Spanish, are used at a higher frequency, other pairs have fewer users which makes it more difficult to train the algorithms – what might have been an easy cognate for an English speaker learning Spanish might be a totally different looking and more challenging word when the speaker’s native language is Hungarian.

Duolingo also wants to improve their ability help students reach more advanced language proficiency. To address this, they’re developing the Stories feature, which lets students review long form passages with more complex words and answer questions about what they’ve heard or read. Different from the traditional Duolingo approach, there are a set number of stories and topics which aren’t algorithmically generated. This highlights the limitation of Duolingo’s approach with advanced levels – few users make it to the end of the lessons, so there is less data to track for more advanced words and topics. As the volume of data thins and as the language (and how people acquire it) becomes more complex, Duolingo has tweaked their algorithmic-based, machine learning-driven approach that made its core offering so successful.4

Long term, Duolingo is exploring how they take their insights with language learning and apply them to other subjects, from physics to reading. It’s extremely difficult for many across the world to access a high-quality education, and while services like Duolingo don’t address the structural issues that prevent people from accessing high quality education, Duolingo could help provide a similar machine learning-driven approach to helping those without resources learn more than just language.5

There is still opportunity to improve Duolingo’s approach. Duolingo focuses its machine learning on what users put into the app, or how well users recall a word when it comes up on the screen. But as any language learner knows, stepping up to counter and ordering something in a new language is very different that remembering flashcards. For users who are trying to use their new language as they learn it, Duolingo can feel disconnected from this real-life experience. But real life provides an incredibly rich data set for Duolingo to analyze and finding a way to integrate it into their algorithms would be a huge next step. Whether it’s allowing users to input words that they forgot each day or even allowing Duolingo to capture and analyze your spoken conversations, the platform could better track how users are actually behaving and provide even better recommendations about what to learn or improve next.

One critical question is how improved translation services will change the role and necessity of language learning. As Google Translate and other similar services get better at accurately translating longer and more complex passages, will people feel compelled to learn new languages? Does Google Translate and similar apps represent a competitive threat for Duolingo? (796 Words)

  1. http://making.duolingo.com/how-we-learn-how-you-learn
  2. https://www.zdnet.com/article/duolingo-the-app-teaching-humans-new-languages-while-teaching-machines-how-to-learn/
  3. https://s3.amazonaws.com/duolingo-papers/other/DET_ShortPaper.pdf
  4. https://forum.duolingo.com/comment/23206010/Introducing-Duolingo-StoriesBeta
  5. https://www.quora.com/What-are-Luis-Von-Ahns-future-plans-for-Duolingo

 

Previous:

TaskRabbit is using open innovation to tap an unrealized labor market, but can it sustain its growth?

Next:

Printing a Solution to the Global Housing Crisis

6 thoughts on “Duolingo: From Hello to Hallo through Machine Learning

  1. Awesome article and really interesting read! It seems like duolingo’s core competency is really well suited for machine learning because of the size of the data set and the type of data. One limitation as you mentioned is that learning in an app is very different than stepping up to a counter and placing an order. I wonder if as machine learning advances, duolingo would be able to use their same machine learning skills for live video streamed interactions, thereby utilizing their core competency while improving their offering to be more realistic.

  2. Really interesting article and well written! Highlighting how Duolingo is able to identify which words in a given language are harder or easier to remember is a key impact of machine learning on the company. One question I have is how Duolingo can expand the amount of data they are incorporating into their machine learning algorithms. One of the limitations of machine learning across the board is the access to quality information in representative quantities, and I wonder how Duolingo can continue to address this issue.

  3. Great work, Jackson — really enjoyed reading your post (especially as a Duolingo user)! It was interesting to learn that most of Duolingo’s machine learning is geared toward optimizing their lessons across all users, rather than tailoring content to individual learning styles — this definitely seems like an improvement opportunity going forward. That should also help with the sentiment that I (and apparently many others) have of the content being a bit disconnected from normal language use.

  4. Great post, Jackson! Very interesting to understand how machine learning can make learning foreign languages much more efficient!
    Technologies such as Google translator can be a “magical” tool for travelers and people who don’t use foreign language frequently, acting as a counter-incentive for the learning of a new language. This might cause the market of learning foreign languages to decrease in the future and put Duolingo in a challenging situation. One possible strategy for Duolingo would be, as mentioned in the post, to leverage their capabilities and play in the learning of traditional subjects, such as physics and reading.

  5. Love this! It’s fascinating to hear that Duolingo is working on applying their technology to other subjects. I wonder how much value machine learning will add in other areas of study, or whether the complexity of language learning is uniquely fitted to ML. For example, in elementary math, it seems intuitive that counting must come before addition which must come before multiplication (whereas, as you mentioned, this linear relationship doesn’t exist as clearly in languages between adverbs, past tense words, etc). Regardless, any level of increased individualization in this sort of highly-accesible learning platform will be great to see!

  6. Jackson, this reading is great! As a person who did not speak a certain language (=English) in everyday life and lost the capability, I wish there were some sort of service like this. I believe that “Forgetting curve” can be applied to any topics from languages to physics as you mentioned, and there is a huge potential. Regarding Google Translate, I still think learning a language is more than simply acquiring a tool to exchange words, but also about understanding the culture and mindset behind, so this service will be powerful!

Leave a comment