Early April 2016: a media bomb exploded all over the planet. The International Consortium of Investigative Journalists (ICIJ) unveiled the Panama Papers, exposing the banking secrets of hundreds of tax-evading individuals. The news embarrassed sports personalities, business leaders and even politicians. Iceland's Prime Minister, for example, was forced to resign in the wake of popular discontent over the journalists' revelations.
And to think, as this video from the German newspaper Süddeutschen Zeitung reminds us, it all began in the spring of 2015 with an anonymous communication sent to the daily newspaper, of which the following is a free translation: "Hello. My name is John Doe. Interested in data? There are a few conditions. My life is in danger. We will only discuss through encrypted channels. No meetings. The subject of the articles is up to you."
When the media ask about the size of the databank, the whistleblower will claim that it will be the largest amount ever...
A titanic task
He wasn't lying, as he sent the Consortium 2.6 terabytes of information representing 11.5 million documents on thousands of shell companies, all over the period from 1977 to 2015. A mass of information ten times greater than that provided at the time by Edward Snowden, and only 370 journalists from around the world to analyze it all! All the more so as nothing had to be released in the meantime to allow the journalistic investigations to take place and the first articles and reports on the revelation of the Panama Papers to be published at the same time.
Since then, journalists have begun to examine how this mass of information was analyzed. They quickly realized that it would be impossible to work with the raw database. The Consortium therefore had to classify and place the millions of items of data on cloud servers with protected connections.
ICIJ's technical experts inventoried the different types of documents (e-mails, databases, PDFs, images and other written documents) and then the subjects of these documents (nominees and names of companies, individuals, banking institutions, etc.) using two Apache software programs(Solr and Tika). Then, thanks to the Luxurious program, the journalists were finally able to start investigating and visually noting the links between institutions, individuals, etc.
With all this work done, all they had to do was type in the names of the wealthiest families, politicians and other personalities to see in a matter of minutes whether they were beneficiaries or players of any kind in this tax haven scheme.
A new type of journalism
The Panama Papers story is not over. More articles and revelations are expected in the weeks and months ahead. Logical, given the incredible quantity of documents transmitted. However, some people are already beginning to take stock of this investigation, which, while highly successful, has shown some of the limits of this type of journalism.
As Le Monde points out in a behind-the-scenes article, the Consortium will have to work on even more effective tools for collaboration between media from different countries. For example, the ICIJ wanted to crowdsource the verification of the true identities behind nominees to facilitate the work of fellow journalists. However, given the cumbersome nature of the task, this did not materialize, forcing journalists to do the verification on their own and slowing down the investigation and writing process. As a result, some stories will have been unintentionally ignored by news workers.
What's more, while business and research are increasingly using tools to analyze Big Data, the world of journalism has yet to systematically adopt such applications. They are unknown or virtually inaccessible to them, and yet this is one of the sectors that would benefit most from them. This is what ICIJ is working on at the moment.
It seems clear that, more and more, information workers will have to deal with massive databases. A few years ago, we were talking about the possibility of training journalists to specialize solely in reading and mining data. Otherwise, at the very least, investigative journalists will have to learn how to navigate and find their way around this information, classify it, verify it and so on.
If today's and tomorrow's journalists are to be able to cover masses of data that are beyond their comprehension, the world of communications and the training of these future communicators will have to take an interest in them, teach their analysis and ways of collaborating between colleagues from different media and countries. A way to give more fangs to the watchdogs of our societies.
Illustration: ProStockStudio, shutterstock
References
Abbruzzese, Jason. "400 Reporters Kept the Panama Papers Secret for a Year. Here's How They Pulled It Off." Mashable. Last updated April 4, 2016. http://mashable.com/2016/04/04/panama-papers-media.
Baruch, Jérémie and Maxime Vaudano. ""Panama Papers": A Technical Challenge For Data Journalism." J'ai Du Bon Data. Last updated April 8, 2016. http://data.blog.lemonde.fr/2016/04/08/panama-papers-un-defi-technique-pour-le-journalisme-de-donnees/.
Heymann, Sébastien. "Panama Papers: How Linkurious Enables ICIJ to Investigate the Massive Mossack Fonseca Leaks." Linkurious. Last updated April 5, 2016. http://linkurio.us/panama-papers-how-linkurious-enables-icij-to-investigate-the-massive-mossack-fonseca-leaks/.
Kabra, Mar, and Erin Kissane. "The People and Tech Behind the Panama Papers." Source. Last updated April 11, 2016. https://source.opennews.org/en-US/articles/people-and-tech-behind-panama-papers/.
Roberge, Alexandre. "Data Journalism, A New Profession?" Thot Cursus. Last updated January 15, 2014. http://cursus.edu/dossiers-articles/articles/21048/journalisme-donnees-nouveau-metier.
Woodie, Alex. "Inside the Panama Papers: How Cloud Analytics Made It All Possible." Datanami. Last updated April 7, 2016. http://www.datanami.com/2016/04/07/inside-panama-papers-cloud-analytics-made-possible/.
Apache Solr - http://lucene.apache.org/solr/
Apache Tika - https://tika.apache.org/
See more articles by this author