A Debian package contains some metadata about the packaged program. For instance, its homepage URL is stored in the source package control file (debian/control) and propagated through the Debian source control file (.dsc). The problem with this approach is that to update the metadata, one has to update the whole package.

Today, thousands of Debian source packages are developed in a version control system, most often Subversion or Git. The repository's URL is also propagated through the control files mentioned above. It is then possible to monitor the main branch to detect and propagate changes without needing to upload to the Debian archive.

In 2009, I proposed to centralise metadata about Upstream in a YAML-formatted file, debian/upstream. We are now using it in the Debian Med project to propagate bibliographic references about the manuscripts describing the packaged programs. The references can be seen on our web sentinels for our metapackages. The data transits through the Ultimate Debian Database (UDD).

Let's imagine that the concept gains success and that thousands of packages provide a debian/upstream file. How can we avoid thousands of daily requests on Alioth to keep the database up to date ?

I am developing a system called Umegaya, for Umegaya is a MEtadata GAtherer using YAml. Umegaya provides a web interface with a simple URL structure to retrieve data. For instance, http://upstream-metadata.debian.net/emboss/reference-year returns 2000, the year where the first scientific article describing EMBOSS was published. If at the time of retrieval, the previous update was older than one hour, the system will read the package's debian/upstream file again. It is therefore by reading the data that it is kept up to date. Conversely, to update the database after modifying debian/upstream, one just has to access it.

Umegaya is still a draft, and many things may change. But the service is already up for more than a year at the address upstream-metadata.debian.net. It is used to fill the Subversion repository called packages-metadata, with gathers the debian/upstream, debian/control and debian/copyright files from recently uploaded packages (since a few months). One can see there that among the 3,646 copyright files, 1,218 declare themselves conformant to the machine-readable format 1.0.

Because I like a lot the principle package what you use, use what you package, umegaya entered the Debian archive a few days ago.