In 2013, the Merriam-Webster dictionary changed the definition of the word “literally” to include its figurative meaning . By doing so, Merriam-Webster gave a nod to the fluid nature of language. But when should we stick by traditional language rules and when can we adapt? This is the question facing Grammarly as it investigates the use of crowd-sourced data for its AI engine. Imagine users decide they don’t care about using the correct version of “your”, should Grammarly’s model change its understanding of “correct” language?
A Brief Overview
Founded in 2008, Grammarly’s mission is to “Improve Lives by Improving Communication” . Grammarly’s core product is a software that makes spelling and grammatical recommendations. Grammarly currently has a freemium model as its primary revenue source and offers several integrations, including a Microsoft Office add-in and a Google Chrome extension .
Product Development and Improvement at Grammarly
At the heart of Grammarly’s core product (and the product of competitors Microsoft and Google) is an AI that identifies the intended meaning of a phrase using contextual clues. That AI must then serve up suggestions on how to improve the phrase. Due to the scope of such a problem, a complex learning algorithm is needed, in this case likely a recurring neural network model (this article focuses on data acquisition, as opposed to algorithm selection/design).
Data acquisition is critical because Grammarly’s competitive advantage over Google is never going to be their expertise in AI. Instead it will continue to be their ability to crowd-source training data for their algorithms. Traditionally, Grammarly collects data in two different forms:
- Public data-sets
- For example, the AESW 2016 Shared Task corpus was created by a single editor making edits to scientific journal articles written in English by non-native English speakers .
- Crowd-sourced user data
- Whenever users decide to accept or ignore a suggestion by Grammarly, they are providing data back to the algorithm. Do to their integration across multiple platforms, Grammarly has access to data from more contexts than Microsoft or Google.
Using these sources, Grammarly can continue improving its core service. But to continue innovating they will need a new sourcing strategy.
Next Steps for Grammarly
The future of written communication and of Grammarly is about more than just grammatical correctness. Users are increasingly looking for more stylistic suggestions. There are many ways to write the same grammatically correct sentence. But some of those sentences are easier to read, maybe because they flow better or use more accessible language.
To keep up with user demands, Grammarly will need to be able to differentiate between more and less coherent sentences. Which is a problem because this requires a new type of data that doesn’t currently exist. Grammarly needs data that doesn’t just look at spot changes, but at more fundamental changes to the underlying sentence structure.
To get this initial data, Grammarly commissioned a study that took text data from various sources including Yahoo Answers and Yelp reviews . They then paid 50 editors crowd-sourced via Mechanical Turk to go through the data and provide holistic changes to the text to improve the overall coherency .
While this initial data-set will help kick-start a new phase of innovation at Grammarly, they face two long-term difficulties as they prove out this technology:
First, paid manual editing is cost prohibitive. Second, as Grammarly moves further away from rules-based suggestions, it will become increasingly difficult to maintain control over what their AI decides is the “correct” suggestion. For example, even in the above case with expert editors, the editors had relatively low agreement on what constituted a coherent sentence .
My suggestions for Grammarly are to continue expanding their integration across various platforms, particularly focusing on mobile to maintain their competitive advantage in data collection. I would also suggest rolling out a “coherence” suggestion pilot as soon as possible to start collecting valuable data. Risks are limited because users who opt-in to the pilot can always ignore suggestions.
Finally, I would continue looking forward and innovating. The natural next step is to look beyond cohesion of a single sentence to cohesion of a paragraph, or entire document. This is where platform context becomes so important since what makes a tweet cohesive is very different from cohesion in an email or paper.
Clearly there is a lot of value yet to be created in this space. Grammarly’s ability to navigate the data collection process will be the key to their long-term success. But we still haven’t answered the fundamental tension faced by crowd-trained ML algorithms. Can we trust the crowd to produce the “correct” answer? Especially when it comes to something as fluid and variable as human language?
Note: All spelling, grammatical, and stylistic errors in this article are puns made intentionally by the author
 Geron, T. (2018). Grammarly, With $110 Million, Brings Artificial Intelligence to Writing. [online] WSJ. Available at: https://www.wsj.com/articles/grammarly-with-110-million-brings-artificial-intelligence-to-writing-1494243003 [Accessed 13 Nov. 2018].
 Merriam-webster.com. (2018). Did We Change the Definition of ‘Literally’?. [online] Available at: https://www.merriam-webster.com/words-at-play/misuse-of-literally [Accessed 13 Nov. 2018].
 Grammarly.com. (2018). Help Grammarly improve lives by improving communication.. [online] Available at: https://www.grammarly.com/jobs [Accessed 13 Nov. 2018].
 Medium. (2018). How Grammarly Quietly Grew Its Way to 6.9 Million Daily Users in 9 Years. [online] Available at: https://medium.com/swlh/how-grammarly-quietly-grew-its-way-to-6-9-million-daily-users-in-9-years-88e417dbfbdf [Accessed 13 Nov. 2018].
 Alice Lai and Joel Tetreault, “Discourse Coherence in the Wild: A Dataset, Evaluation and Methods,” arXiv:1805.04993, May 2018.
 Textmining.lt. (2018). AESW 2016. [online] Available at: http://textmining.lt/aesw/index.html [Accessed 13 Nov. 2018].
 Tech.grammarly.com. (2018). Paving the way for human-level sentence corrections. [online] Available at: https://tech.grammarly.com/blog/paving-the-way-for-human-level-sentence-corrections [Accessed 13 Nov. 2018].