Spinning up the pulpdist project

One novel aspect of the pulpdist project is that it is starting with an almost completely blank slate from a technology point of view (aside from the decision to use Pulp as the main component of the mirroring network). Red Hat does have development standards for internal projects, of course (especially in the messaging space), but they're fairly flexible, leaving the individual tool development teams with a lot of options. If something ships with Fedora and/or RHEL, or is available under licensing terms that would be acceptable for inclusion in Fedora (and subsequently RHEL), then it's fair game.

This post focuses on the design of the management server. I'll write up a separate post looking at the currently planned design for the Pulp data transfer plugins.

Source Control

Unsurprisingly, Red Hat's internal processes are heavily influenced by Linux kernel processes. Accordingly, the source control tool of choice for new projects is Git. While I have a slight preference for Mercurial (due mainly to familiarity), I'm happy enough with any DVCS, so Git it is.

Primary Development Language

Python, of course. You don't hire a CPython core developer to get them to work on a Ruby or Perl project (although the current system I'm replacing was written in Perl). As a web application, there will naturally be some Javascript and CSS involved as well.

Web Framework

The main management application for pulpdist is going to be a full-scale web application. User profiles and authentication, database storage, communication with other web services, provision of a REST API, integration with the engineering tools messaging bus. Basically, micro-frameworks need not apply.

While I expect Pyramid/Pylons would also have been able to do the job, I decided to go with Django 1.3. This was heavily influenced by social factors: I know a lot of Django devs that I can bug for advice, but the same is not true for Pyramid. The complexity of the whole Pyramid/Pylons/TurboGears setup is also not appealing - while veteran web developers may find the "you decide" approach a selling point, Django's batteries included approach makes it far simpler to get started quickly, and decide as I go along which pieces I should keep, discard or replace.

I've heard some experienced Django developers muttering complaints about the class based views design in 1.3, but as someone coming in that is an experienced Python developer, but a relatively noobish web developer, the CBV approach seems eminently sensible, while the old function based approach looks repetitive and insane. Object oriented programming was invented for a reason!

I'll admit that my perception may be biased by knowing exactly how to make multiple inheritance work the way I want it to, though :)

Web Server

The management server doesn't actually have that much work to do, so the basic Apache+mod_wsgi configuration will serve as an adequate starting point (any heavy lifting will be done by the individual Pulp instances, and the main data traffic on those doesn't run through their web service). WSGI provides the flexibility to revisit this later if needed.

I've also punted on any web caching questions for now - the management server is low traffic and once the access to the Pulp sites is pushed out to a backend service, it should be fast enough at least for the early iterations.

Authentication & Authorisation

The actual user authentication task will be handed off to Apache and all management application access will be restricted to Kerberos authenticated users over SSL. Django's own permissions systems will be used to handle authorisation restrictions. (The experimental prototype will use Basic Auth instead, since it is the Apache/Django integration the prototype needs to cover, not the Apache configuration for SSL and Kerberos authentication)

Integration with Pulp's user access controls is via OAuth, but the design for configuration of user permissions in the Pulp servers is still TBD.

Database and ORM

Again, the management server isn't doing the heavy lifting in this application. The Pulp instances use MongoDB, but for the management server I currently plan to use the standard Django ORM backed by PostgreSQL. For the prototype instance, the database is actually just an SQLite3 file. I'm not quite sold on this one as yet - it's tempting to start playing with SQLAlchemy, since I've already had to hack around some of the limitations in the native ORM in order to store encrypted fields. OTOH, I already have a ton of things to do on this project, so messing with this is a long way down the priority list.

Schema and data maintenance is handled using South.

HTML Templating

The standard Django templating engine should be sufficient for my needs. As with the ORM, it's tempting to look into upgrading it to something like Jinja2, but once again 'good enough' is likely to be the deciding factor.

For data table display, I'm using Django Tables 2 and form display will use Django Uni-Form.


The REST API for the service is currently there primarily as a development aid - it lets me publish the full data model to the web as soon as it stabilises (and even while its still in flux), even if the UI for end users hasn't been fully defined. This is particularly useful for the metadata coming back from the Pulp server, since it doesn't need much post-processing to be included as raw data in the management server's own REST API. The JSON interface will also allow much of the backend processing to be fully exercised by the test suite without worrying about web UI details.

The design of the REST API was heavily influenced by this Lessons Learned piece from the RHEV-M developers. The Django Rest Framework means I can just define the data I want to display as a list or dictionary and the framework takes care of formatting it nicely, including rendering URLs as hyperlinks.

AMQP Messaging

I haven't actually started on this aspect in any significant way, but the two main contenders I've identified are python-qpid (which is what Pulp uses) and django-celery (which would also give me an internal task queue engine, which the management server is going to need - the prototype just does everything in the Django process, which is OK for experimentation on the LAN, but clearly inadequate long term when talking to multiple sites distributed around the planet). At this early stage, I expect the internal task management aspect is going to tip the decision in favour of the latter.

Testing Regime

As the foundation for the automated testing, I'm going with Django Sane Testing (mainly based on the example of other internal Django projects). Michael Foord's mock module lets me run at least some of the tests without relying on an external Pulp instance (fortunately, the namespace conflict with Fedora's RPM building utility 'mock' was recently resolved with the latter's support library being renamed to 'mockbuild').

Continuous integration is an open question at this point. Pulp uses Jenkins for CI and I'm inclined to follow their lead. The other main possibility is to use Beaker, Red Hat's internal test system originally set up for kernel testing (one key attraction Beaker offers is the ability to set up multi-server multi-site testing in a test recipe so I can run tests over the internal WAN).


Tito is a tool for generating SRPMs and RPMs directly from a Git repository. For my own packages, this is the approach I'm using (with handcrafted spec files). For some strange reason, the sysadmins around here like it when internal devs provide things as pre-packaged RPMs for deployment :)

Packaging of upstream PyPI dependencies that aren't available as Fedora or RHEL packages is still a work in progress. I experimented with Tito and git submodules (which doesn't work) and git subtrees (which does work, but is seriously ugly). My next attempt is likely to be based on py2pack, so we'll see how that goes (I actually discovered that project by searching for 'cpanspec pypi' after hearing some of the Perl folks here extolling the virtues of cpanspec for easily packaging CPAN modules as RPMs).

I also need to switch to using virtualenv to get a clearer distinction between Fedora packages I added via yum install and stuff I picked up directly from PyPI with pip.


Comments powered by Disqus