Publish at October 26 2021 Updated November 04 2021

Archive Internet

The Daily Loss of Information from the Net is Colossal

If you want to see a historian cry, particularly one who specializes in the ancient world, tell him about the Library of Alexandria. This almost mythical institution contained an immense amount of knowledge accumulated during that period. It was unfortunately destroyed by an invader, taking centuries of knowledge with it.

In theory, many view the Internet as becoming the new utopia of knowledge where information is accumulated and stored. But in practice, we have already come across a page displaying the infamous 404 error, meaning that the content has been moved or deleted. A corner of the Internet rendered obsolete and possibly lost for good. And it happens far more frequently than we may realize.

Preventing The Erosion of The Net

In effect, we find ourselves blindsided by the vast amount of content that is added daily. However, some content will be removed during this time. And it's not just about social media and not just posts on social networks. This article from The Atlantic in 2015 recalled that an investigative work by a Pulitzer Prize-nominated journalist ended up disappearing from the web. A series of 34 in-depth articles about a tragedy in Colorado went up in smoke. But, as an Internet historian says in the text, it is always possible to decipher, even if only in part, a burnt piece of paper. Yet a site can disappear without leaving virtually any trace.

Thankfully, some people quickly realized that the global network would need archivists. So individuals work daily to archive pages, images, public messages, etc. Because time is of the essence. A team of researchers realized that, especially on social networks, 11% of shared items get lost after a year of publication. Then, this increases at a rate of 0.2% per day since the anniversary of publication. This means that resources that have covered milestones like the Arab Spring, the November 2015 Paris attacks, or the World Cup in 2014 are gone forever.

In light of this, web archaeologists are trying to preserve what gets published online. Of course, the work is colossal, and almost impossible to save everything that is put on the Internet every day. Still, they try with the help of bots that automatically collect. The best known, Internet Archive, was born out of this desire to preserve as much as possible before it disappears. The Wayback Machine thus allows Internet users to review sites in their old versions. Interested in seeing Thot Cursus in 2008?

Since then, other players have entered the archiving dance such as, PageFreezer or the U.S. Library of Congress, which has decided to play the role of net archivist as well. On the French side, the Bibliothèque Nationale de France (BnF) has been doing the same for the past few years.

A Duty to Remember

It might seem futile to archive a medium as teeming as the Internet. After all, what good would it do? Yet more and more professionals are having to deal with archives. In the news media, reviewing old statements is always important for contextualization. Public figures are increasingly using the Internet to express themselves, leaving behind interesting, shocking or contradictory words that need to be retrieved. For fact-checking, this becomes very convenient. For example, in 2018, Donald Trump was lamenting that Google did not promote his State of the Union speech on its homepage while it did for Obama. It only took a little inspection to prove that the president was lying and that Google had indeed highlighted the event that day.

For researchers, among others, in the humanities, these archives are intended to be a colossal material to analyze. They can thus filter according to their field of interest, note behaviors or movements, amass them and visualize them. It's even easier with cloud computing to research the archives. Hence the importance for archivists is not only to accumulate data but also categorize it well and provide tools so that scholars can easily ferret out a specific time, particular terms, etc.

So, let's think about the covid-19 period that will have left a lasting mark on the beginning of the 2020 decade. Archivists are already, both in the field and on the internet, amassing as much as they can. This will be the most publicized pandemic in history also generating a lot of debate and misinformation. Moreover, the WayBack Machine has decided at the end of 2020 to offer fact-checking of archived pages. This will show whether a site or video has been removed for spreading misleading information.

Citizen Archivists

While many bots and humans are working to save sites from complete extinction, Internet users could help them by doing the same. Indeed, there are technological solutions to archive sites yourself. Wallabag is a software that allows you to keep a page and be able to read it again even years later in an intelligible way. As a result, the program doesn't necessarily keep the container but rather the content.

For this, it's best to turn to Conifer, which saves desired Internet pages in WARC format like an does, for example. The site offers a free 5-gigabyte account and you'll have to pay for more storage. Nevertheless, it can be interesting if you are afraid to lose a particular page. Finally, Archivebox is an open-source software for self-hosting web pages. So it will develop a server on which it is possible to put as much content as desired.

Archivebox Internet is not just there to make fun of the awful layouts of the time. It offers researchers a colossal database to analyze parts of modern history or sociology in the 21st century. In addition, archiving would also make it possible to preserve teaching material or pedagogical approaches. Indeed, how many of them are lost in the ocean of Internet information when they could be useful to young teachers? Net archivists, therefore, have their work cut out for them to ensure that all this mass of knowledge does not disappear.

Illustration : Markus Spiske on Unsplash


"3 Outils Très Pratiques Pour Archiver Sa Vie Numérique !" Dbeley. Last updated March 15, 2021.

Cuneo, François. "Internet Archive Et Son WayBackMachine, Toute L’histoire Du Web Sous L’index." Le Blog Du Cuk. Last updated March 9, 2021.

Ferreira, Elsa. "L’inéluctable Désintégration D’Internet Et Les Archivistes Du Web." CTRLZ. Last updated July 16, 2021.

Gelinas, James. "The Wayback Machine Will Now Fact-check Archived Websites and Articles." Last updated November 4, 2020.

LaFrance, Adrienne. "Raiders of the Lost Web." The Atlantic. Last updated October 14, 2015.

Lo, Saliou. "L'archivage Du Web Par La BnF: Sauvegarde Et Valorisation De La Mémoire En Ligne." Métiers Des Archives Et Des Bibliothèques : Médiation De L'histoire Et Humanités Numériques. Last updated: January 13, 2021.

Puren, Marie. "L’archivage du Web." Archive Ouverte HAL. Last updated November 13, 2020.

Ruest, Nick, Samantha Fritz, Jimmy Lin, and Ian Milligan. "From Archive to Analysis: Accessing Web Archives at Scale Through a Cloud-based Interface." International Journal of Digital Humanities. Last updated January 6, 2021.

Shreffler, Stephanie. "The Internet Archive Has Been Fighting for 25 Years to Keep What's on the Web from Disappearing - and You Can Help." The Conversation. Last updated August 13, 2021.

Spinney, Laura. "What are COVID archivists keeping for tomorrow's historians?" Nature. Last updated December 23, 2020.

Turbé, Sebastien. "5 Archives En Ligne Pour Visualiser Les Anciennes Versions D'un Site." Codeur Mag. Last updated April 23, 2021.

Vlassenroot, Eveline, Sally Chambers, Sven Lieber, Alejandra Michel, Friedel Geeraert, Jessica Pranger, Julie Birkholz, and Peter Merchant. "Web-archiving and Social Media: an Exploratory Analysis." International Journal of Digital Humanities. Last updated June 22, 2021.

See more articles by this author


  • Memory and recording

Access exclusive services for free

Subscribe and receive newsletters on:

  • The lessons
  • The learning resources
  • The file of the week
  • The events
  • The technologies

In addition, index your favorite resources in your own folders and find your history of consultation.

Subscribe to the newsletter
Superprof: the platform to find the best private tutors  in the United States.

Add to my playlists

Create a playlist

Receive our news by email

Every day, stay informed about digital learning in all its forms. Great ideas and resources. Take advantage, it's free!