When asked to describe machine learning in layman’s terms, quantitative futurist Amy Webb put it simply as “a branch of computer science in which computers are programmed to do things that normally require human intelligence.”
Since their inception, newsrooms have been packed with journalists hungry to uncover the truth and share it with the general public. Yet the rise of digital, free, sites have put this industry at a crossroads and machine learning might very well be their last remaining lifeboat.
AI could not have replaced Woodward and Bernstein in uncovering the Watergate scandal. It required building trust and rapport with sources that a machine could never do. But machine learning did significantly help the ICIJ (International Council of Investigative Journalism) sort through the millions of documents uncovered in the Panama Papers scandal1. With data dumps becoming more common via sites like Wikileaks, journalists now look towards machine for much needed assistance in finding the “truth” needle, amongst the haystack.
Beyond its use as “truth” search engine. News organizations have also leveraged AI as an investigative tool, exemplified by Buzzfeed News designing algorithms to help detect undercover spy planes2 and ProPublica detecting patterns in politicians stated interests3. Some have even taken the extra step of replacing video editors and news hosts altogether4, however experimentally.
The case of the management of Infobae.com (a leading Argentine news site) is slightly different. In the words of feature editor Loreley Gaffoglio “We look towards the US and now China mostly for our source of inspiration. South American news organizations are not seen as innovation thought leaders. But with some of the world’s most impactful corruption scandals -Petrolao in Brazil, Cuadernos in Argentina- we are uniquely poised to use this technology as a public service”.
Indeed, the status quo of corrupt government organizations, can perhaps for the first time be challenged in this region, since it’s main pillars (fraudulent contracts and bribes) always leave behind a money or paper trail. For the first time these organizations can deploy tools to sift through government documents and bank transactions and uncover the red flags.
In the short term the site is looking to partner with banking consultants to leverage already existing fraud detection technology. “My banks knows within seconds if a transaction is questionable, they will block it and inform me. Why not then use that same technology to help us detect bribes?” comments the editor. This could open a door to possibly stop corrupt practices as they happen, pointing journalists to the right lead, instead of wasting their time down a dead end. “Nothing eliminates the importance of whistle blowers and confidential sources, but this helps us know where to look for them, and that’s already 80% of the job”. In the long term the organization is looking towards expanding fraud detection to also include contracts. “Corrupt practices are always hidden within government, we have found it incredibly difficult to effectively identify these, but this is where our efforts are headed”.
It is in this aspect, precisely due to its difficulties and potential, that I would advise the organization to focus on. By chartering a team, possibly in partnership with the ICIJ, they could develop software that intelligently detects common fraudulent practices in contracts. The main difficulty is the balance between supervised and unsupervised learning, very much like the IBM Watson team struggles. A truly useful program would be able to identify discrepancies between the legal measures applied and the scope of project under the contract. A purely supervised model would only be able to identify corrupt practices based on current practices (assuming the program had sufficient data inputs), but as corruption evolves it would always stay a step behind, and therefore fail as investigative tool to uncover something new. An unsupervised learning model on the contrary would require a much bigger data set, but could potentially be more intuitive, and could also deliver results based on a confidence percentage like Watson did and so point journalists towards the most attractive leads for their verification.
The main problem would then lie within the organizations focus and allocation of resources. As explained by the editor “Our main issue is that we live minute to minute. In an endless cycle of breaking news, worsened by our digital environment. We would like to have the tools to help investigations, but it’s hard to dedicate an entire team to something that’s years down the line, when we’re thinking about what we’re going to publish 2 hours from now”. And so I look to my classmates to ideally help solve this conundrum. What would be the best way to establish an AI team within a news organization with limited resources? What should be their first area of focus? Should this be internally led? Or build upon others’ inventions?
Word Count: 797