Around 90% of emails sent around the world are spam . Luckily, Google has spent significant time and money on Gmail’s spam filters, making them very effective at making sure the vast majority of these spam emails don’t make it to our inboxes.
How do they do it? By constantly refining their spam filter algorithms.
Gmail looks at various inputs for each email (see Exhibit): who is the sender (and what we already know about them), what’s their IP address, what is the content etc., and tries to classify it as either “spam” or “not spam”. If an email is deemed “not spam”, it shows up user’s inbox – otherwise it’s sent directly to the “spam” folder. But then, importantly, Gmail also monitors users’ behaviors to refine its filter’s decision-making process. By monitoring users’ engagement with the email, it can close the feedback loop and improve its predictive power. If, for example, emails with the words “Magic pill” and “weight loss” tend to be flagged by users as “spam”, or are just deleted as soon as they are received, the Gmail team might decide to penalize emails with those words, increasing the likelihood of blocking them. In fact, in all likelihood, the Gmail team doesn’t discuss such individual decisions, but rather on rules to apply so that the algorithm picks up on these patterns on its own. Ever pushing to improve its filter, this year Gmail announced that it will expand the list of things its filter looks at to include each recipient’s taste, as inferred for their Gmail activity history (e.g., what they’re generally interested in) . Oh, and if this smells like direct network effects to you, you’re not mistaken.
The value created to users is significant. With almost a billion users , and using my own personal spam folder to estimate 20 spam emails per day, and ~1 second saved per spam email, I estimate ~200,000 man-years saved per year. On a per-user basis the numbers are not as impressive, but with little differentiation between email services, a bad spam filter just might be what’s needed to make a user switch. That is not to mention the benefit of not falling prey to the various scams out there.
The value is captured somewhat indirectly. By offering a better email experience, Google hopes to get even more users to switch to Gmail, and to use it more extensively (at the expense of its competitors). With more users, Gmail can of course display more email ads. But even more than that, there are multiple synergies with Google’s other products, the most obvious one is probably better targeting of Google search ads based on the user’s online behavior, as it’s manifested in their email account (e.g., if a user receives lots of emails from Amazon, maybe they’re more likely to click a link to Amazon’s website).
This brings us to two unique challenges faced by Gmail, when designing its spam filters. One, it never actually observes whether it made the right decision. It can use proxies, such as whether the user ignored the message, or deleted it, as a proxy to whether they were interested in receiving it in the first place. But it doesn’t really know – maybe the user wanted to receive the email, but was just quick to process the information contained in it. Another challenge is understanding the cost/value of a false negative and false positive. Unlike a gaming company that can easily track whether they increased spending per user, with spam filters, it’s not obvious whether the users are actually better off after making a change, and even more so whether Google overall is better off with all the aforementioned synergies.
Going forward, the spam filter development is likely to continue its cat-and-mouse dynamic. Spammers are constantly trying to adapt to the rules adopted by Gmail’s spam filters. Already the internet is full of lists such as “16 Ways To Get Your Email Past Spam Filters” . What’s almost certain though, is that the world today is far more spam-free than it was ten years ago (e.g., 2010 showed an ~80% drop is spam volume ) – perhaps a result of better filtering eroding the financial upside of spamming – making the digital world a better place.
Photo credit: techydudes.com