Software repository
A software repository, or “repo” for short, is a storage location for software packages. Often a table of contents is stored, as well as metadata. Repositories group packages. Sometimes the grouping is for a programming language, such as CPAN for the Perl programming language, sometimes for an entire operating system, sometimes the license of the contents is the criteria.
At client side, a package manager helps installing from and updating the repositories.
At server side, a software repository is typically managed by source control or repository managers. Some of the repository managers allow to aggregate other repository location into one URL and provide a caching proxy. When doing continuous builds many artifacts are produced and often centrally stored, so automatically deleting the ones which are not released is important.
Overview
Many software publishers and other organizations maintain servers on the Internet for this purpose, either free of charge or for a subscription fee. Repositories may be solely for particular programs, such as CPAN for the Perl programming language, or for an entire operating system. Operators of such repositories typically provide a package management system, tools intended to search for, install and otherwise manipulate software packages from the repositories. For example, many Linux distributions use Advanced Packaging Tool, commonly found in Debian based distributions, or yum found in Red Hat based distributions. There are also multiple independent package management systems, such as pacman, used in Arch Linux and equo, found in Sabayon Linux.As software repositories are designed to include useful packages, major repositories are designed to be malware free. If a computer is configured to use a digitally signed repository from a reputable vendor, and is coupled with an appropriate permissions system, this significantly reduces the threat of malware to these systems. As a side effect, many systems that have these capabilities do not require anti-malware software such as anti-virus software.
Most major Linux distributions have many repositories around the world that mirror the main repository.
Package management system vs. package development process
A package management system is different from a package development process.A typical use of a package management system is to facilitate the integration of code from possibly different sources into a coherent stand-alone operating unit. Thus, a package management system might be used to produce a distribution of Linux, possibly a distribution tailored to a specific restricted application.
A package development process, by contrast, is used to manage the co-development of code and documentation of a collection of functions or routines with a common theme, producing thereby a package of software functions that typically will not be complete and usable by themselves. A good package development process will help users conform to good documentation and coding practices, integrating some level of unit testing. The table below provides examples of package development processes.
Selected repositories
The following table lists a few languages with repositories for contributed software. The "Autochecks" column describes the routine checks done.Very few people have the ability to test their software under multiple operating-systems with different versions of the core code and with other contributed packages they may use. For R, the Comprehensive R Archive Network runs tests routinely. To see how this is valuable, suppose Sally contributes a package A. Sally only runs the current version of the software under one version of Microsoft Windows, and has only tested it in that environment. At more or less regular intervals, CRAN tests Sally's contribution under a dozen combinations of operating systems and versions of the core R language software. If one of them generates an error, she gets that error message. With luck, that error message may suffice to allow her to fix the error, even if she cannot replicate it with the hardware and software she has. Next, suppose John contributes to the repository a package B that uses a package A. Package B passes all the tests and is made available to users. Later, Sally submits an improved version of A, which unfortunately, breaks B. The autochecks make it possible to provide information to John so he can fix the problem.
This example exposes both a strength and a weakness in the R contributed-package system: CRAN supports this kind of automated testing of contributed packages, but packages contributed to CRAN need not specify the versions of other contributed packages that they use. Procedures for requesting specific versions of packages exist, but contributors might not use those procedures.
Beyond this, a repository such as CRAN running regular checks of contributed packages actually provides an extensive if ad hoc test suite for development versions of the core language. If Sally gets an error message she does not understand or thinks is inappropriate, especially from a development version of the language, she can ask the core development-team for the language for help. In this way, the repository can contribute to improving the quality of the core language software.
Language / purpose | Package Development Process | Repository | Install methods | Collaborative development platform | Autochecks |
Haskell | Common Architecture for Building Applications and Libraries | Hackage | |||
Java | Maven | ||||
Julia | |||||
Common Lisp | Quicklisp | ||||
.NET | NuGet | NuGet | |||
Node.js | NPM | ||||
Perl | CPAN | PPM | |||
PHP | PEAR, Composer | PECL, Packagist | |||
Python | Setuptools | PyPI | pip, EasyInstall, PyPM, Anaconda | ||
R | R CMD check process | CRAN | install.packages | R-Forge | Roughly weekly on 12 platforms or combinations of different versions of R with up to 7 different operating systems. |
Ruby | RubyGems | Ruby Application Archive | RubyForge | ||
Rust | Cargo | Crates | Cargo | ||
TeX, LaTeX | CTAN |
Many other programming languages, among them C, C++, and Fortran, do not possess a central software repository with universal scope. Notable repositories with limited scope include:
- Netlib, mainly mathematical routines for Fortran and C, historically one of the first open software repositories;
- Boost, a strictly curated collection of high-quality libraries for C++; some code developed in Boost later became part of the C++ standard library.
Package managers
Package Manager | Description |
NPM | A package manager for Node.js |
pip | A package installer for Python |
APT | For managing Debian Packages |
Homebrew | A package installer for MacOS that allows you to install packages Apple didn't |
Repository managers
Relationship to continuous integration
As part of the development lifecycle, source code is continuously being built into binary artifacts using continuous integration. This may interact with a binary repository manager much like a developer would by getting artifacts from the repositories and pushing builds there. Tight integration with CI servers enables the storage of important metadata such as:- Which user triggered the build
- Which modules were built
- Which sources were used
- Dependencies used
- Environment variables
- Packages installed
Artifacts and packages
Compared to source files, binary artifacts are often larger by orders of magnitude, they are rarely deleted or overwritten, and they are usually accompanied by much metadata such as id, package name, version, license and more.
Metadata
describes a binary artifact, is stored and specified separately from the artifact itself, and can have several additional uses. The following table shows some common metadata types and their uses:Metadata type | Used for |
Versions available | Upgrading and downgrading automatically |
Dependencies | Specify other artifacts that the current artifact depends on |
Downstream dependencies | Specify other artifacts that depend on the current artifact |
License | Legal compliance |
Build date and time | Traceability |
Documentation | Provide offline availability for contextual documentation in IDEs |
Approval information | Traceability |
Metrics | Code coverage, compliance to rules, test results |
User-created metadata | Custom reports and processes |
Products providing repository management
Software to manage repositories includes:- Apache Archiva "repository management software build artifact repository"
- Azure Artifacts
- CloudRepo "Fully managed, cloud based, private and public repositories."
- Cloudsmith "The new standard in Package Management and Software Distribution."
- Dist "Reliable, secure, private, and fast Docker Container Registries and Maven Repositories hosted in the cloud."
- Inedo's ProGet "Universal Package Manager. World-class features. Accessible for everyone."
- feedz.io "Package Hosting and Distribution"
- Github Package Registry
- JFrog's Artifactory
- MyGet "continuous delivery service hosting 1000s of NuGet, Bower and NPM package repositories"
- Packagecloud "A unified, developer friendly interface for all of your artifacts."
- Package Drone "a package manager repository for OSGi"
- Sonatype's Nexus: works with build tools like Ant, Ivy, Gradle, Maven, SBT among others.
- Pulp "free and open source platform for managing repositories of software packages and making it available to large numbers of consumers. Supported types: RPM, Python, Puppet, Docker and OSTree."