Software Heritage


Software Heritage is an initiative whose goal is to collect, preserve, and share software code—both freely licensed and not—in a universal software storage archive.

History

Although started in 2015, the initiative was worked on as a research project for two years before that time. Software Heritage began public operations on June 30, 2016. It was formed under the auspices of the French research institute, French Institute for Research in Computer Science and Automation, which hosts the initiative on its servers. The budget Inria is providing for the project is €500,000 over three years.
Software Heritage was founded by computer scientists Roberto Di Cosmo and Stefano Zacchiroli. Its repository holds over 20 million software projects, with an archive of over 2.7 billion unique source files as of July 2016.
Additional sponsors of the Software Heritage initiative include Microsoft and the Royal Netherlands Academy of Arts and Sciences and the
Netherlands Organisation for Scientific Research's Data Archiving and Networked Services. Creative Commons, Free Software Foundation, GitHub, Jason Scott, the Linux Foundation, and Microsoft among others have endorsed the project.

Overview

Software Heritage's goal is to preserve software in its original source code that is free/open source software. The focus of the initiative is to collect, preserve, and share software that is across cultural heritage, industry, education, science, and research communities, with the concern that software that is made up of technical and scientific knowledge will be lost without preservation. The project came about because software code is seen as being even more vulnerable to corruption and obsolescence than typical archival holdings like books and other media like video and film.
The interface is built using open source code, with an initial focus on search, where end-users search by SHA-1 hashes. The Software Heritage initiative is open to scientific researchers, with the idea that it would be a Library of Alexandria type resource for software. Additionally, Software Heritage will be an infrastructure resource upon which developers can build applications on top of the archive. Another goal is to get guidance from researchers on what features might be valuable as a way to structure output and collection curation.
Other grass-roots initiatives exist, like archivist Jason Scott's Textfiles.com project, the Code Archive, as well as the Internet Archive's Wayback Machine. Software Heritage is gathering software that has free licenses from sources that include GitHub, Debian package archive, and GNU Project FTP archive and from entities like Gitorious and Google Code, projects that no longer exist.
The archive is structured so knowledge can be preserved, enabling continuous access to digital information, as well as creating a building block for thematic portals and collections of software. The initiative can be used to create better software for the industry, where original software has often been lost. Software Heritage will ensure long-term preservation of software, making software provenance more traceable, integrated, and reusable, with an ability to know licensing and use constraints, track security vulnerabilities, and assist in the discovery of prior code assets.

Awards

In 2016 Software Heritage received the best community project award at Paris Open Source Summit 2016.
In 2019 Software Heritage received the award of Academic Initiative from the Pôle Systematic.