How does machine learning improve the credibility of online information?

An experiment using artificial intelligence to detect infomercials

Traditional methods of assessing media content are both resource-intensive and time-consuming. Hence the idea of exploring the field of possibilities with machine learning, which can help us spot news articles that contain the opinions and prejudices of the journalists themselves. IREX[1] and Lore.ai have conducted a very interesting experiment using machine learning to improve infox detection.

The experiment tests the feasibility of using machine learning to automatically assess the quality of media content in Mozambique. It is aimed at media support professionals who want to take advantage of innovative digital tools to amplify their work, as well as non-governmental organizations (NGOs) and global development organizations in general who are striving to find practical ways of using this technology and applying it meaningfully and responsibly to their work.

They proceeded in 5 steps:

Define the problem statement and choose a specific media quality indicator,
Load 1,200 online news articles into machine learning software,
Show examples of articles containing dogmatic content,
check and correct the software's suggestions, and
Evaluate the results and repeat steps 3 and 4.

The team used software to scan the websites of nine major print media and imported over 1,200 articles into the machine learning software. The evaluators then began training the tool to identify opinions in the text.

Using the software's "highlighter" tool, evaluators clicked on sentences in the articles to show examples of opinions. Using these examples, the software identified patterns and looked for other similar sentences. Evaluators examined the sentences the software flagged as possible opinions and determined whether the suggestion was correct or incorrect. The team performed this feedback loop 51 times.

This experiment tested whether software can automatically assess the impartiality of online news articles, in particular by identifying the opinions expressed in news articles.

The results prove that it is possible for machine learning to help us find opinions in news articles. The software identified these articles reliably enough to apply them to the monitoring and evaluation of media content, at a scale and speed that far exceeds that of conventional human evaluators (in seconds, rather than minutes or hours). Here are some of the key results of the experiment:

95% accuracy: The software found articles containing an opinion 95 times out of 100.
84% accuracy: 84 of the 100 articles analyzed actually contained opinions or infoxes;
93% recall: out of 100 articles containing opinions, the software found 93, but missed 7.
The more the model was trained, the greater its accuracy and precision: There is a clear relationship between the number of times the evaluators trained the software and the accuracy and precision of the results. Recall results did not improve as consistently over time.
The software's ability to "learn" was almost immediately evident: The evaluation team noticed a marked improvement in the accuracy of the software's suggestions after showing just 20 sentences that had opinions.

Another experiment was organized by the Stanford Question Answering Dataset, or SQuAD, through a benchmarking competition that measures the quality of AIs in this type of task. The first SQuAD prize was won by a team from Salesforce's Artificial Intelligence Research Center : their AI could detect infoxes with 80% accuracy.

Still free to say anything, but...

Fabula AI is another example of a British start-up that has developed its own technology called Geometric Deep Learning, which has developed an artificial intelligence capable of detecting infoxes with a very promising success rate, reaching 93%. This high success rate justifies their recent acquisition by Twitter, which states that: "Fabula AI will enable us to analyze large and complex datasets. This startup's technology will be a real driver for our platform and should enable us to strive to offer better quality information and make our communities feel safer[2]".

Ultimately, it is impossible[4] to totally eliminate human bias, or to encode it perfectly in machine learning applications. So machine learning doesn't eliminate bias[5], but it does apply it more consistently. Furthermore, more research and experimentation is needed as machine learning can help us spend resources more efficiently, but greater exposure to the technology is needed to realize its potential appropriately and responsibly.

Measuring media quality is a subjective exercise, and achieving 100% perfect credibility is impossible. Human reasoning through critical thinking is the only way to ensure perfect detection of infox (fake news). Hence the importance of making media literacy an institutionalized and commonplace practice in all areas of life: at school, at work and... in algorithms.

Illustration: Falkenpost - Pixabay

[1] IREX, 'Can Machine Learning Help Us Measure the Trustworthiness of News?', IREX, accessed 8 July 2019,
http://www.irex.org/resource/can-machine-learning-help-us-measure-trustworthiness-news

[2] Valentin Cimino, 'Twitter just bought Fabula AI to fight fake news', Siècle Digital (blog), 4 June 2019,
https://siecledigital.fr/2019/06/04/twitter-vient-de-racheter-fabula-ai-pour-lutter-contre-les-fake-news/

[3] Edd Gent, 'This Startup Is Training AI to Gobble Up the News and Rewrite It Free of Bias', Singularity Hub (blog), 16 April 2018,
https://singularityhub.com/2018/04/16/this-startup-is-training-ai-to-gobble-up-the-news-and-rewrite-it-free-of-bias/

[4] Dave Gershgorn, 'In the Fight against Fake News, Artificial Intelligence Is Waging a Battle It Cannot Win', Quartz, accessed 8 July 2019,
https://qz.com/843110/can-artificial-intelligence-solve-facebooks-fake-news-problem/

[5] Karen Hao, 'AI Is Still Terrible at Spotting Fake News', MIT Technology Review, accessed 8 July 2019,
https://www.technologyreview.com/s/612236/even-the-best-ai-for-spotting-fake-news-is-still-terrible/

How does machine learning improve the credibility of online information?

An experiment using artificial intelligence to detect infomercials

Still free to say anything, but...

Files

Access exclusive services for free