Package Forge

This introduction provides the background and motivation for developing the Package Forge system. It also states the main requirements for the project and gives an overview of the design for the Package Forge. Here are quick links to the sections:

  1. Background and Motivation
  2. Requirements
  3. Design Overview

Background and Motivation

We have long recognised within the computing team of the University of Edinburgh School of Informatics that we need some mechanism for automatically building software from source on multiple platforms which requires minimal input and minimal effort from the user.

Currently we are required to have multiple machines available to which computing officers (COs) have login access to enable them to build packages. Due to the effort required in switching from one platform to the next (e.g. SL5 to SL6), we typically have 2 supported platforms at any one time and we always support both i386 and x86_64 architectures. This means that we normally need to have 4 build hosts available and in transition periods we may need 6. These servers need to all have the necessary set of development software packages either installed or ready to hand (e.g. installable by yum for Redhat systems). This creates a dependency on the availability of quite a few servers, it also adds a maintenance burden and, worst of all, it requires the CO to login up to 6 times, install their source code, build the software and then submit the results. This is a manual process which requires the CO to monitor the process for errors and provide regular interactive feedback to move it on to the next stage. To add to these problems, there are often subtle differences between platforms and environments which can be troublesome during the build process or result in broken packages being generated.

At various times some COs have attempted to create scripts to workaround a lot of this overhead with varying degrees of success. It is relatively easy to have a script which can use ssh to login and work through the steps, checking for errors at each stage. Such a script can even be made to do the work in parallel across the build hosts to save waiting time. This certainly does lead to a more reliable and reproducible build process but it is not immune to issues such as a build host being unavailable. It also does not cope with the situation where a build host is already working flat-out to build some other piece of software. When something is not urgent it may be better to wait until sufficient resources become available. A further issue is that we regularly find platforms or specific architectures being missed by COs due to simple forgetfulness or a lack of awareness. Packages are often only built for the commonly used platforms or the results are only partially submitted to the central package repository with some sub-packages being lost in the process. Furthermore, even with a script to handle the multiple login issue, it still needs to be updated when new platforms are introduced. It would be much better if a single system existed which had the full knowledge of the supported platforms and could build the software appropriately.

A number of systems (often referred to as "build farms") already exist, such as Redhat's koji, which can achieve much of what we require. At the start of this project some time was spent investigating and evaluating these systems but none was found to successfully meet all of our requirements. It is not the aim of this document to discuss the ways in which they failed to meet our needs, the focus will solely be on the PkgForge system which was developed.

One useful thing which came out of this initial investigation was the discovery of mock, the Redhat package build tool. This is used to manage a small chroot, build-requirements are automatically installed using yum and the source package is built using the standard rpmbuild tool. The mock tool helps to build packages in a reliable and reproducible fashion without the need for all the build dependencies to be already installed. It also permits the building of packages for different platforms and architectures on a single machine providing they are reasonably compatible. In practise this tends to means that a single x86_86 architecture machine can be used to build packages for both i386 and x86_64 for the current Redhat/Fedora platform and any other Redhat/Fedora platform which was released prior to the current platform. It is sometimes possible to build packages for newer platforms (e.g. F14 on F13) but there is little guarantee that this will work. In particular, we have encountered problems when important libraries such as rpmlib have API changes. One limitation of the mock system is that for any particular platform/architecture configuration it is only possible to have one build in operation at once. This means that although it is excellent for single users it is not so useful for multiple users who end up having to wait for previous users to complete their work. This is a fundamental issue related to the use of chroots. It can, of course, be worked around by each user having their own configurations and build directories but that returns the user to needing to maintain their own build environment configurations. This shows that some sort of automated queueing mechanism is essential. A CO should be able to fire off their source package to be built and forget about it until a final status report is issued. It should not require any sort of frequent, interactive, checking to see if the build system is idle and ready to do work. One further nice feature of mock is that it can generate a yum repository from the results of previous build runs. This means that it is possible to immediately use the results of one build to satisfy the build requirements of the next build. We often have to build large sets of software just to satisfy the build-requirements of one particular piece of software so this feature could be very useful.

It is worth noting at this point the factors which limit the speed at which a package can be built on multiple platforms using mock. The main factor which limits the number of mock build processes which can be run concurrently on a single system is the number of processors/cores available. Ideally there has to be a processor per mock build process, anything less results in a very slow build. A secondary factor is the disk as the mock build process uses the disk rather intensively, this is mainly due to the various caching which is done. Building within a local disk is more-or-less essential to achieve a decent level of performance. Also, having fast disks and possibly even a separate disk per build-process helps make build processes go a lot faster.

Further to the investigations into koji and mock the build systems for other Linux distributions were examined to see what features they provided. The Debian build infrastructure is particularly of interest as it provides many of the features that we require. Although there did not appear to be much opportunity for code reuse it has provided a great deal of inspiration for the design of a distributed build system which handles many different architectures.

For efficiency, the CO team in Informatics generally prefer to work from the command-line so appropriate tools will have to be provided. Many of the CO team regularly works from a variety of locations, including at home as well as in the office. This means that the tools should be installable and runnable from a wide variety of systems (preferably including non-RPM based systems such as Debian). There is also the need to be able to submit source packages across foreign networks. As well as providing the ability to submit from anywhere, an interface will be needed to provide feedback on the build results and access to the log files. This is probably best done through a web interface.

Any system developed will need to be designed such that it can cope with builders for target platforms being unavailable for a period of time. Any source packages submitted during a period of builder unavailability should be queued until such time as the builder becomes available again. This should not require any explicit actions from the users. It should also be possible to continue to submit source packages even when the processing of new submissions is not active. Any submitted jobs should be stored for later processing and queueing when the master comes back online.

To make the system sufficiently robust it should be designed so that it is not overly complex. Partially, this can be achieved by using existing infrastructure wherever applicable. For instance, the source packages can be simply submitted into a directory where the COs have write access rather than developing a bespoke submission system with its own protocol and file transmission system.

Where possible the intention is very much that the design should make use of existing infrastructure and tools rather than completely re-inventing the wheel. In Informatics there are existing authentication systems (e.g. Kerberos and cosign for the web) and an existing network file-system (openafs) which can be utilised. As previously mentioned there are build tools such as mock which will do the bulk of the work. This does mean that some decisions will be made that do not suit external users but it is hoped that workarounds will always be possible. For example, AFS could be replaced with NFS or submission could be done only on a certain host into the local file-system. Going further, for example, the design should make it possible for a submission tool using sftp to be written.

As well as utilising existing infrastructure the aim is to use existing software modules wherever possible. The bulk of the code will be written using Perl in an object-oriented style. For this, the current "best practice" recognised by the wider Perl community is to use Moose. In general, standard modules from CPAN will be sought whenever possible. The aim is that the code should also reach certain minimal quality standards. This can be achieved by standardising the formatting using perltidy and checking with perlcritic. To ensure that the code works as expected, and continues to do so as Perl and the various modules continue to develop, a comprehensive test suite will also be essential. As we are aiming to support a wide range of platforms the test suite will be an important part of porting to any new operating system where the versions of Perl and the required modules may differ quite substantially.

Although the motivation for the development of the PkgForge system has come from with the computing team in the School of Informatics it has always been part of the considerations that the software developed should be useful to other external groups (both within the University and elsewhere). With this in mind the intention was that any system developed should be reasonably platform agnostic and extensible to suit the needs of others. Although we currently only use systems which are RPM-based it may well be that at some point in the future we need the ability to build software automatically on MacOSX or Debian so that possibility should not be deliberately excluded.

Requirements

Having taken into account the background and motivation previously discussed a number of specific requirements where formulated. (These are not in any particular order of merit).

  1. It must be possible for the user to be able to submit source packages in a very simple and straightforward fashion which requires minimal effort. In particular, there must be command line submission tool which makes the setting of any options, etc., a trivial process.
  2. The build process must be totally automated and not require any interactive feedback from the user.
  3. If a build is successful then all build products must be automatically submitted to the correct location (the package "bucket" in Informatics terminology).
  4. The build environment must be consistent and the build process must be reliable and reproducible.
  5. It must be possible for a user to submit a job with little or no prior knowledge of the supported platforms (other than the type of package required, e.g. SRPM). However, it must also be possible to restrict the set of target platforms when it makes no sense (e.g. certain software will not build on the x86_64 architecture).
  6. It must be possible to have multiple machines servicing a single target platform. This is to allow the efficient handling of large scale rebuilds, in particular, this will be essential when Informatics ports all their local software to a new platform.
  7. The system should be extensible so that support can be added for platforms which do not use the RPM package format.
  8. It should be possible for external users to deploy the system with minimal effort. To achieve this all the configuration data must be in configuration files and nothing specific to the School of Informatics should be hardwired into the code.
  9. It should be possible to submit a set of source packages all at once for building as a single set. The results of building each piece of software in the set would then become available for the rest of the set. Only if all software in the set builds successfully would anything be submitted.
  10. Existing infrastructure should be used where possible to minimise the amount of new development work that needs to be done for the system.
  11. There must be a comprehensive test suite.
  12. The code must meet certain quality standards and be formatted in a standard way. The coding style should follow recognised Perl best practises.
  13. Wherever possible existing libraries should be used to aid portability and reduce the amount of new code which has to be developed and maintained.

Design Overview

Based on the background, motivations and requirements which have already been discussed an outline design was developed. Each of these parts will be discussed in greater detail in subsequent sections. This is a simple high-lelevel overview of the system.

Build Jobs

Central to the design of the PkgForge system is the concept of a build job which is what the user submits. A build job has a unique identifier and consists of a set of, one or more, source packages and some instructions which tell the system how and where the packages should be built and submitted.

A build job is submitted as a directory which contains all the source packages and a meta-data file. This directory is placed into a standard location in the filesystem where the user has write access. In Informatics the filesystem used is AFS so that a build job may be submitted from anywhere.

The Registry

The registry is used to store all the information necessary for scheduling tasks. It contains the information about the current set of supported platforms, the build daemons for the platforms and the list of jobs along with the associated tasks. The status of each task and job is tracked as it progresses through its lifecycle and logs are kept of when jobs have been attempted. It, deliberately, does not contain all the information which is submitted by the user which is in the job meta-data file. When a build daemon needs the extra information (such as obtaining the list of source packages to be built) it loads the job data from the file directly. This makes it possible to easily extend the set of information in the meta-data file whenever necessary without having to alter the database schema.

The registry is stored in a PostgreSQL database. The incoming queue processor and build daemons communicate directly with this database, there is not a bespoke communication layer. It is unlikely that the system will work as it stands with any other database type. All of the access controls, the locking and some of the scheduling code are written in the pgsql language. A deliberate decision was made to utilise these features of the database as they are likely to be more reliable and more performant than any similar features which could be added to the PkgForge code.

Incoming Job Processor

There is a single daemon (which runs on the server often referred to as the master) which regularly checks the directory into which new jobs are submitted. Whenever a new job directory is detected the job is registered, processed and, if validated, transferred to a different secure location and queued for the builder daemons. Various validity checks are carried out, for example, with source RPMs they are first checked to ensure they are valid SRPMs.

The queueing of a job is done by constructing the set of target platforms based on what is available and what the user has requested. Once the set of platforms is computed a separate build task is scheduled for the job on each platform. The job and the scheduled tasks are added to the central registry. Only the core information required for the scheduling of jobs/tasks is copied from the submitted job meta-data into the registry.

The Build Daemons

Each active target platform has a set of, zero or more, build daemons. The build daemons do the actual work of building packages from the submitted sources. When there are no build daemons in operation the new tasks will be queued awaiting the addition of a new daemon. Each build daemon must be entered into the registry before it can begin accepting tasks. An entry in the registry does not guarantee that the build daemon is in anyway live and operational, it is purely used to record which platform/architecture the build daemon supports.

The task scheduling is very basic. Each time a check of the queue is made it is ordered according to submission date and time with the oldest task being selected. For the sake of sanity, during this process the table is locked so any other daemons for the same platform will have to wait to retrieve new tasks. If a builder does not complete a task (neither success or failure) it might put the task back onto the queue. A task may be attempted multiple times, by multiple build daemons and may even be retried after a build failure. The build log tracks every attempt to build a task.

When a new task is accepted by a build daemon it goes through several phases. Initially the list of source packages is filtered to retrieve only those which are applicable (e.g. SRPMs for a builder that uses mock/rpmbuild). After that an attempt is made to build each source package in turn according to the original sequence specified by the submitter. The behaviour upon failure is controllable by the user, it can either be an immediate failure of the whole task or the source package can be put to the end of the list of sources and tried again later. The RPM builder uses the second approach by default so that it can try to handle a set of source packages having build-requirements upon each other. After the package building phase there is an optional checking stage, this is done to ensure the validity of the generated packages. Finally, if all source packages are built correctly and pass the checks, the packages will be submitted to the appropriate repository using pkgsubmit. In all cases after completion of a task the log files are stored in a central "results" directory which is accessible from the web interface. Optionally the build daemons can generate reports at the end of the build process (for example sending an email to the user who submitted the job).

The Web Interface

The first iteration of the web interface is fairly basic. It is used to display the list of jobs and associated tasks a user has submitted and the current status of each. For each task it is possible to retrieve log files and the generated packages (when a task has succeeded).

In the future the web interface may gain the capability to modify the registry. This would allow a user to cancel and retry jobs and tasks. It may also gain a feature for uploading jobs which would be a very useful when a secure networked filesystem is not available.