Twitter, a popular online social networking site, facilitates communication of ideas among individuals, companies, and organizations. Founded in 2006, Twitter now has over 300 million active users worldwide . In a nutshell, Twitter is a social media-based communication platform on which users exchange “tweets,” short messages (280 characters or fewer) capturing ideas, news, and reactions. The platform processes 6,000 messages per second or nearly 200 billion per year.
Given such a high volume of messages, content is largely unfiltered. In fact, Twitter’s business model is built upon real-time communication and so any potential review system would impede on posting speed and run counter to the site’s purpose. The lack of filtering has lead to a rise in spam and automatically generated content. Prior to implementing detection mechanisms, Twitter relied on users reporting incidents of suspected spam. Users submit thousands of spam reports daily, identifying sources of irrelevant or inappropriate content with the intention of disabling spam accounts.
As Twitter mentions, “inauthentic accounts, spam, and malicious automation disrupt everyone’s experience on Twitter.” The prevalence of spam and its negative impact on users has emerged as a top priority for Twitter, as its corporate valuation (as with similar internet-based companies) depends on the size and engagement level of its user base and dissatisfied users could leave the site or become less active. Advances in machine learning have presented an opportunity for Twitter to leverage computer processing power and known user trends to identify and automatically disable accounts contributing spam. As such, Twitter has invested in machine learning to address this problem. As a company whose “product” is a robust base of news and thoughts, spam directly threatens the site’s value proposition and action against spam is necessary for its survival and success.
In a recent memo, Twitter reaffirmed its commitment to ensuring content shared was reliable, trustworthy, and relevant. The rampant incidence of spam motivated Twitter’s decision to invest in machine learning technology to automatically identify and take action against accounts producing spam . This approach fights spam proactively, rather than waiting to receive and verify suspected reports of spam submitted by users. Since implementation last year, Twitter has been successful in identifying 3x more spam accounts (10 million per month as of May) . Consequently, user-submitted spam reports have declined from 25,000 in March to 17,000 in May.
Research conducted by Alex H. Wang at Pennsylvania State University describes the mechanism behind Twitter’s approach to spam detection. His research paper entitled “Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach” highlights the numerical parameters algorithms leverage to identify “spammy” accounts. The algorithm extracts the number and the relationships among the user’s friends and followers to evaluate the user’s authenticity. According to Mr. Wang, this machine learning approach is “an efficient and accurate to identify spam bots in Twitter” .
Twitter has implemented this machine learning approach since 2017 and continues to see positive results. In the near term, Twitter expects to continue calibrating and improving its machine learning algorithms to improve spam detection. According to a research report from Deakin University in Australia, Twitter’s current methods and techniques achieve an accuracy rate of approximately 80% . While quite successful already, Twitter can further increase its accuracy rate toward 100%. Presently, in 20% of cases, Twitter’s algorithms incorrectly label accounts as spam and interrupt their ability to contribute content. Over the next few years, it is reasonable to expect that Twitter will close the accuracy gap and elevate the quality of its machine learning algorithm to achieve accuracy rates closer to 100%.
Creating a Twitter account involves specifying a name and phone number or e-mail address, which is then verified via a code sent to the user. Sophisticated bots can generate fake phone numbers and e-mail addresses and successfully create Twitter accounts. As a suggestion, Twitter could adopt Google’s reCAPTCHA technology, which requires users to decipher blurry words or phrases and enter them to proceed with making accounts. While not insurmountable, unsophisticated bots struggle with passing through the reCAPTCHA step and thus would be prevented from creating a spam account .
Question for further discussion: What is the customer impact of falsely identifying a Twitter account as spam and suspending it?
Word Count: 702
 Twitter, Inc. Annual Report 2018 (February 2018). http://www.viewproxy.com/Twitter/2018/AnnualReport2017.pdf. Accessed November 13, 2018.
 Twitter, Inc. “How Twitter is Fighting Spam and Malicious Automation” (June 2018). https://blog.twitter.com/official/en_us/topics/company/2018/how-twitter-is-fighting-spam-and-malicious-automation.html Accessed November 13, 2018.
 Wang, Alex Hai. Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach. (June 2010). Data Applications: Security and Privacy. Accessed November 13, 2018.
 Wu, Tingmin et. al. Twitter Spam Detection Based on Deep Learning. (February 2017). Australasian Computer Science Week. Accessed November 13, 2018.
 Beede, Rodney. Analysis of reCAPTCHA Effectiveness. (December 2010). Computer Vision. Accessed November 13, 2018.