27 languages to improve your Python

As a co-designer of one of the world's most popular programming languages, one of the more frustrating behaviours I regularly see (both in the Python community and in others) is influential people trying to tap into fears of "losing" to other open source communities as a motivating force for community contributions. (I'm occasionally guilty of this misbehaviour myself, which makes it even easier to spot when others are falling into the same trap).

While learning from the experiences of other programming language communities is a good thing, fear based approaches to motivating action are seriously problematic, as they encourage community members to see members of those other communities as enemies in a competition for contributor attention, rather than as potential allies in the larger challenge of advancing the state of the art in software development. It also has the effect of telling folks that enjoy those other languages that they're not welcome in a community that views them and their peers as "hostile competitors".

In truth, we want there to be a rich smorgasboard of cross platform open source programming languages to choose from, as programming languages are first and foremost tools for thinking - they make it possible for us to convey our ideas in terms so explicit that even a computer can understand them. If someone has found a language to use that fits their brain and solves their immediate problems, that's great, regardless of the specific language (or languages) they choose.

So I have three specific requests for the Python community, and one broader suggestion. First, the specific requests:

  1. If we find it necessary to appeal to tribal instincts to motivate action, we should avoid using tribal fear, and instead aim to use tribal pride. When we use fear as a motivator, as in phrasings like "If we don't do X, we're going to lose developer mindshare to language Y", we're deliberately creating negative emotions in folks freely contributing the results of their work to the world at large. Relying on tribal pride instead leads to phrasings like "It's currently really unclear how to solve problem X in Python. If we look to ecosystem Y, we can see they have a really nice approach to solving problem X that we can potentially adapt to provide a similarly nice user experience in Python". Actively emphasising taking pride in our own efforts, rather than denigrating the efforts of others, helps promote a culture of continuous learning within the Python community and also encourages the development of ever improving collaborative relationships with other communities.
  2. Refrain from adopting attitudes of contempt towards other open source programming language communities, especially if those communities have empowered people to solve their own problems rather than having to wait for commercial software vendors to deign to address them. Most of the important problems in the world aren't profitable to solve (as the folks afflicted by them aren't personally wealthy and don't control institutional funding decisions), so we should be encouraging and applauding the folks stepping up to try to solve them, regardless of what we may think of their technology choices.
  3. If someone we know is learning to program for the first time, and they choose to learn a language we don't personally like, we should support them in their choice anyway. They know what fits their brain better than we do, so the right language for us may not be the right language for them. If they start getting frustrated with their original choice, to the point where it's demotivating them from learning to program at all, then it makes sense to start recommending alternatives. This advice applies even for those of us involved in improving the tragically bad state of network security: the way we solve the problem with inherently insecure languages is by improving operating system sandboxing capabilities, progressively knocking down barriers to adoption for languages with better native security properties, and improving the default behaviours of existing languages, not by confusing beginners with arguments about why their chosen language is a poor choice from an application security perspective. (If folks are deploying unaudited software written by beginners to handle security sensitive tasks, it isn't the folks writing the software that are the problem, it's the folks deploying it without performing appropriate due diligence on the provenance and security properties of that software)

My broader suggestion is aimed at folks that are starting to encounter the limits of the core procedural subset of Python and would hence like to start exploring more of Python's own available "tools for thinking".

One of the things we do as part of the Python core development process is to look at features we appreciate having available in other languages we have experience with, and see whether or not there is a way to adapt them to be useful in making Python code easier to both read and write. This means that learning another programming language that focuses more specifically on a given style of software development can help improve anyone's understanding of that style of programming in the context of Python.

To aid in such efforts, I've provided a list below of some possible areas for exploration, and other languages which may provide additional insight into those areas. Where possible, I've linked to Wikipedia pages rather than directly to the relevant home pages, as Wikipedia often provides interesting historical context that's worth exploring when picking up a new programming language as an educational exercise rather than for immediate practical use.

While I do know many of these languages personally (and have used several of them in developing production systems), the full list of recommendations includes additional languages that I only know indirectly (usually by either reading tutorials and design documentation, or by talking to folks that I trust to provide good insight into a language's strengths and weaknesses).

There are a lot of other languages that could have gone on this list, so the specific ones listed are a somewhat arbitrary subset based on my own interests (for example, I'm mainly interested in the dominant Linux, Android and Windows ecosystems, so I left out the niche-but-profitable Apple-centric Objective-C and Swift programming languages, and I'm not familiar enough with art-focused environments like Processing to even guess at what learning them might teach a Python developer). For a more complete list that takes into account factors beyond what a language might teach you as a developer, IEEE Spectrum's annual ranking of programming language popularity and growth is well worth a look.

Procedural programming: C, Rust, Cython

Python's default execution model is procedural: we start at the top of the main module and execute it statement by statement. All of Python's support for the other approaches to data and computational modelling covered below is built on this procedural foundation.

The C programming language is still the unchallenged ruler of low level procedural programming. It's the core implementation language for the reference Python interpreter, and also for the Linux operating system kernel. As a software developer, learning C is one of the best ways to start learning more about the underlying hardware that executes software applications - C is often described as "portable assembly language", and one of the first applications cross-compiled for any new CPU architecture will be a C compiler

Rust, by contrast, is a relatively new programming language created by Mozilla. The reason it makes this list is because Rust aims to take all of the lessons we've learned as an industry regarding what not to do in C, and design a new language that is interoperable with C libraries, offers the same precise control over hardware usage that is needed in a low level systems programming language, but uses a different compile time approach to data modelling and memory management to structurally eliminate many of the common flaws afflicting C programs (such as buffer overflows, double free errors, null pointer access, and thread synchronisation problems). I'm an embedded systems engineer by training and initial professional experience, and Rust is the first new language I've seen that looks like it may have the potential to scale down to all of the niches currently dominated by C and custom assembly code.

Cython is also a lower level procedural-by-default language, but unlike general purpose languages like C and Rust, Cython is aimed specifically at writing CPython extension modules. To support that goal, Cython is designed as a Python superset, allowing the programmer to choose when to favour the pure Python syntax for flexibility, and when to favour Cython's syntax extensions that make it possible to generate code that is equivalent to native C code in terms of speed and memory efficiency.

Learning one of these languages is likely to provide insight into memory management, algorithmic efficiency, binary interface compatibility, software portability, and other practical aspects of turning source code into running systems.

Object-oriented data modelling: Java, C#, Eiffel

One of the main things we need to do in programming is to model the state of the real world, and offering native syntactic support for object-oriented programming is one of the most popular approaches for doing that: structurally grouping data structures, and methods for operating on those data structures into classes.

Python itself is deliberately designed so that it is possible to use the object-oriented features without first needing to learn to write your own classes. Not every language adopts that approach - those listed in this section are ones that consider learning object-oriented design to be a requirement for using the language at all.

After a major marketing push by Sun Microsystems in the mid-to-late 1990's, Java became the default language for teaching introductory computer science in many tertiary institutions. While it is now being displaced by Python for many educational use cases, it remains one of the most popular languages for the development of business applications. There are a range of other languages that target the common JVM (Java Virtual Machine) runtime, including the Jython implementation of Python. The Dalvik and ART environments for Android systems are based on a reimplementation of the Java programming APIs.

C# is similar in many ways to Java, and emerged as an alternative after Sun and Microsoft failed to work out their business differences around Microsoft's Java implementation, J++. Like Java, it's a popular language for the development of business applications, and there are a range of other languages that target the shared .NET CLR (Common Language Runtime), including the IronPython implementation of Python (the core components of the original IronPython 1.0 implementation were extracted to create the language neutral .NET Dynamic Language Runtime). For a long time, .NET was a proprietary Windows specific technology, with mono as a cross-platform open source reimplementation, but Microsoft shifted to an open source ecosystem strategy in early 2015.

Unlike most of the languages in this list, Eiffel isn't one I'd recommend for practical day-to-day use. Rather, it's one I recommend because learning it taught me an incredible amount about good object-oriented design where "verifiably correct" is a design goal for the application. (Learning Eiffel also taught me a lot about why "verifiably correct" isn't actually a design goal in most software development, as verifiably correct software really doesn't cope well with ambiguity and is entirely unsuitable for cases where you genuinely don't know the relevant constraints yet and need to leave yourself enough wiggle room to be able to figure out the finer details through iterative development).

Learning one of these languages is likely to provide insight into inheritance models, design-by-contract, class invariants, pre-conditions, post-conditions, covariance, contravariance, method resolution order, generic programming, and various other notions that also apply to Python's type system. There are also a number of standard library modules and third party frameworks that use this "visibly object-oriented" design style, such as the unittest and logging modules, and class-based views in the Django web framework.

Object-oriented C derivatives: C++, D

One way of using the CPython runtime is as a "C with objects" programming environment - at its core, CPython is implemented using C's approach to object-oriented programming, which is to define C structs to hold the data of interest, and to pass in instances of the struct as the first argument to functions that then manipulate that data (these are the omnipresent PyObject* pointers in the CPython C API). This design pattern is deliberately mirrored at the Python level in the form of the explicit self and cls arguments to instance methods and class methods.

C++ is a programming language that aimed to retain full source compatibility with C, while adding higher level features like native object-oriented programming support and template based metaprogramming. It's notoriously verbose and hard to program in (although the 2011 update to the language standard addressed many of the worst problems), but it's also the language of choice in many contexts, including 3D modelling graphics engines and cross-platform application development frameworks like Qt.

The D programming language is also interesting, as it has a similar relationship to C++ as Rust has to C: it aims to keep most of the desirable characteristics of C++, while also avoiding many of its problems (like the lack of memory safety). Unlike Rust, D was not a ground up design of a new programming language from scratch - instead, D is a close derivative of C++, and while it isn't a strict C superset as C++ is, it does follow the design principle that any code that falls into the common subset of C and D must behave the same way in both languages.

Learning one of these languages is likely to provide insight into the complexities of combining higher level language features with the underlying C runtime model. Learning C++ is also likely to be useful when using Python to manipulate existing libraries and toolkits written in C++.

Array-oriented data processing: MATLAB/Octave, Julia

Array oriented programming is designed to support numerical programming models: those based on matrix algebra and related numerical methods.

While Python's standard library doesn't support this directly, array oriented programming is taken into account in the language design, with a range of syntactic and semantic features being added specifically for the benefit of the third party NumPy library and similarly array-oriented tools.

In many cases, the Scientific Python stack is adopted as an alternative to the proprietary MATLAB programming environment, which is used extensively for modelling, simulation and numerical data analysis in science and engineering. GNU Octave is an open source alternative that aims to be syntactically compatible with MATLAB code, allowing folks to compare and contrast the two approaches to array-oriented programming.

Julia is another relatively new language, which focuses heavily on array oriented programming and type-based function overloading.

Learning one of these languages is likely to provide insight into the capabilities of the Scientific Python stack, as well as providing opportunities to explore hardware level parallel execution through technologies like OpenCL and Nvidia's CUDA, and distributed data processing through ecosystems like Apache Spark and the Python-specific Blaze.

Statistical data analysis: R

As access to large data sets has grown, so has demand for capable freely available analytical tools for processing those data sets. One such tool is the R programming language, which focuses specifically on statistical data analysis and visualisation.

Learning R is likely to provide insight into the statistical analysis capabilities of the Scientific Python stack, especially the pandas data manipulation library and the seaborn statistical visualisation library.

Computational pipeline modelling: Haskell, Scala, Clojure, F#

Object-oriented data modelling and array-oriented data processing focus a lot of attention on modelling data at rest, either in the form of collections of named attributes or as arrays of structured data.

By contrast, functional programming languages emphasise the modelling of data in motion, in the form of computational flows. Learning at least the basics of functional programming can help greatly improve the structure of data transformation operations even in otherwise procedural, object-oriented or array-oriented applications.

Haskell is a functional programming language that has had a significant influence on the design of Python, most notably through the introduction of list comprehensions in Python 2.0.

Scala is an (arguably) functional programming language for the JVM that, together with Java, Python and R, is one of the four primary programming languages for the Apache Spark data analysis platform. While being designed to encourage functional programming approaches, Scala's syntax, data model, and execution model are also designed to minimise barriers to adoption for current Java programmers (hence the "arguably" - the case can be made that Scala is better categorised as an object-oriented programming language with strong functional programming support).

Clojure is another functional programming language for the JVM that is designed as a dialect of Lisp. It earns its place in this list by being the inspiration for the toolz functional programming toolkit for Python.

F# isn't a language I'm particularly familiar with myself, but seems worth noting as the preferred functional programming language for the .NET CLR.

Learning one of these languages is likely to provide insight into Python's own computational pipeline modelling tools, including container comprehensions, generators, generator expressions, the functools and itertools standard library modules, and third party functional Python toolkits like toolz.

Event driven programming: JavaScript, Go, Erlang, Elixir

Computational pipelines are an excellent way to handle data transformation and analysis problems, but many problems require that an application run as a persistent service that waits for events to occur, and then handles those events. In these kinds of services, it is usually essential to be able to handle multiple events concurrently in order to be able to accommodate multiple users (or at least multiple actions) at the same time.

JavaScript was originally developed as an event handling language for web browsers, permitting website developers to respond locally to client side actions (such as mouse clicks and key presses) and events (such as the page rendering being completed). It is supported in all modern browsers, and together with the HTML5 Domain Object Model, has become a de facto standard for defining the appearance and behaviour of user interfaces.

Go was designed by Google as a purpose built language for creating highly scalable web services, and has also proven to be a very capable language for developing command line applications. The most interesting aspect of Go from a programming language design perspective is its use of Communicating Sequential Processes concepts in its core concurrency model.

Erlang was designed by Ericsson as a purpose built language for creating highly reliable telephony switches and similar devices, and is the language powering the popular RabbitMQ message broker. Erlang uses the Actor model as its core concurrency primitive, passing messages between threads of execution, rather than allowing them to share data directly. While I've never programmed in Erlang myself, my first full-time job involved working with (and on) an Actor-based concurrency framework for C++ developed by an ex-Ericsson engineer, as well as developing such a framework myself based on the TSK (Task) and MBX (Mailbox) primitives in Texas Instrument's lightweight DSP/BIOS runtime (now known as TI-RTOS).

Elixir earns an entry on the list by being a language designed to run on the Erlang VM that exposes the same concurrency semantics as Erlang, while also providing a range of additional language level features to help provide a more well-rounded environment that is more likely to appeal to developers migrating from other languages like Python, Java, or Ruby.

Learning one of these languages is likely to provide insight into Python's own concurrency and parallelism support, including native coroutines, generator based coroutines, the concurrent.futures and asyncio standard library modules, third party network service development frameworks like Twisted and Tornado, the channels concept being introduced to Django, and the event handling loops in GUI frameworks.

Gradual typing: TypeScript

One of the more controversial features that landed in Python 3.5 was the new typing module, which brings a standard lexicon for gradual typing support to the Python ecosystem.

For folks whose primary exposure to static typing is in languages like C, C++ and Java, this seems like an astoundingly terrible idea (hence the controversy).

Microsoft's TypeScript, which provides gradual typing for JavaScript applications provides a better illustration of the concept. TypeScript code compiles to JavaScript code (which then doesn't include any runtime type checking), and TypeScript annotations for popular JavaScript libraries are maintained in the dedicated DefinitelyTyped repository.

As Chris Neugebauer pointed out in his PyCon Australia presentation, this is very similar to the proposed relationship between Python, the typeshed type hint repository, and type inference and analysis tools like mypy.

In essence, bothTypeScript and type hinting in Python are ways of writing particular kinds of tests, either as separate files (just like normal tests), or inline with the main body of the code (just like type declarations in statically typed languages). In either case, you run a separate command to actually check that the rest of the code is consistent with the available type assertions (this occurs implicitly as part of the compilation to JavaScript for TypeScript, and as an entirely optional static analysis task for Python's type hinting).

Dynamic metaprogramming: Hy, Ruby

A feature folks coming to Python from languages like C, C++, C# and Java often find disconcerting is the notion that "code is data": the fact that things like functions and classes are runtime objects that can be manipulated like any other object.

Hy is a Lisp dialect that runs on both the CPython VM and the PyPy VM. Lisp dialects take the "code as data" concept to extremes, as Lisp code consists of nested lists describing the operations to be performed (the name of the language itself stands for "LISt Processor"). The great strength of Lisp-style languages is that they make it incredibly easy to write your own domain specific languages. The great weakness of Lisp-style languages is that they make it incredibly easy to write your own domain specific languages, which can sometimes make it difficult to read other people's code.

Ruby is a language that is similar to Python in many respects, but as a community is far more open to making use of dynamic metaprogramming features that are "supported, but not encouraged" in Python. This includes things like reopening class definitions to add additional methods, and using closures to implement core language constructs like iteration.

Learning one of these languages is likely to provide insight into Python's own dynamic metaprogramming support, including function and class decorators, monkeypatching, the unittest.mock standard library module, and third party object proxying modules like wrapt. (I'm not aware of any languages to learn that are likely to provide insight into Python's metaclass system, so if anyone has any suggestions on that front, please mention them in the comments. Metaclasses power features like the core type system, abstract base classes, enumeration types and runtime evaluation of gradual typing expressions)

Pragmatic problem solving: Lua, PHP, Perl

Popular programming languages don't exist in isolation - they exist as part of larger ecosystems of redistributors (both commercial and community focused), end users, framework developers, tool developers, educators and more.

Lua is a popular programming language for embedding in larger applications as a scripting engine. Significant examples include it being the language used to write add-ons for the World of Warcraft game client, and it's also embedded in the RPM utility used by many Linux distributions. Compared to CPython, a Lua runtime will generally be a tenth of the size, and it's weaker introspection capabilities generally make it easier to isolate from the rest of the application and the host operating system. A notable contribution from the Lua community to the Python ecosystem is the adoption of the LuaJIT FFI (Foreign Function Interface) as the basis of the JIT-friendly cffi interface library for CPython and PyPy.

PHP is another popular programming language that rose to prominence as the original "P" in the Linux-Apache-MySQL-PHP LAMP stack, due to its focus on producing HTML pages, and its broad availability on early Virtual Private Server hosting providers. For all the handwringing about conceptual flaws in various aspects of its design, it's now the basis of several widely popular open source web services, including the Drupal content management system, the Wordpress blogging engine, and the MediaWiki engine that powers Wikipedia. PHP also powers important services like the Ushahidi platform for crowdsourced community reporting on distributed events.

Like PHP, Perl rose to popularity on the back of Linux. Unlike PHP, which grew specifically as a web development platform, Perl rose to prominence as a system administrator's tool, using regular expressions to string together and manipulate the output of text-based Linux operating system commands. When sh, awk and sed were no longer up to handling a task, Perl was there to take over.

Learning one of these languages isn't likely to provide any great insight into aesthetically beautiful or conceptually elegant programming language design. What it is likely to do is to provide some insight into how programming language distribution and adoption works in practice, and how much that has to do with fortuitous opportunities, accidents of history and lowering barriers to adoption by working with redistributors to be made available by default, rather than the inherent capabilities of the languages themselves.

In particular, it may provide insight into the significance of projects like CKAN, OpenStack NFV, Blender, SciPy, OpenMDAO, PyGMO, PyCUDA, the Raspberry Pi Foundation and Python's adoption by a wide range of commercial organisations, for securing ongoing institutional investment in the Python ecosystem.

TCP echo client and server in Python 3.5

This is a follow-on from my previous post on Python 3.5's new async/await syntax. Rather than the simple background timers used in the original post, this one will look at the impact native coroutine support has on the TCP echo client and server examples from the asyncio documentation.

First, we'll recreate the run_in_foreground helper defined in the previous post. This helper function makes it easier to work with coroutines from otherwise synchronous code (like the interactive prompt):

def run_in_foreground(task, *, loop=None):
    """Runs event loop in current thread until the given task completes

    Returns the result of the task.
    For more complex conditions, combine with asyncio.wait()
    To include a timeout, combine with asyncio.wait_for()
    if loop is None:
        loop = asyncio.get_event_loop()
    return loop.run_until_complete(asyncio.ensure_future(task, loop=loop))

Next we'll define the coroutine for our TCP echo server implementation, which simply waits to receive up to 100 bytes on each new client connection, and then sends that data back to the client:

async def handle_tcp_echo(reader, writer):
    data = await reader.read(100)
    message = data.decode()
    addr = writer.get_extra_info('peername')
    print("-> Server received %r from %r" % (message, addr))
    print("<- Server sending: %r" % message)
    await writer.drain()
    print("-- Terminating connection on server")

And then the client coroutine we'll use to send a message and wait for a response:

async def tcp_echo_client(message, port, loop=None):
    reader, writer = await asyncio.open_connection('', port,
    print('-> Client sending: %r' % message)
    data = (await reader.read(100)).decode()
    print('<- Client received: %r' % data)
    print('-- Terminating connection on client')
    return data

We then use our run_in_foreground helper to interact with these coroutines from the interactive prompt. First, we start the echo server:

>>> make_server = asyncio.start_server(handle_tcp_echo, '')
>>> server = run_in_foreground(make_server)

Conveniently, since this is a coroutine running in the current thread, rather than in a different thread, we can retrieve the details of the listening socket immediately, including the automatically assigned port number:

>>> server.sockets[0]
<socket.socket fd=6, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('', 40796)>
>>> port = server.sockets[0].getsockname()[1]

Since we haven't needed to hardcode the port number, if we want to define a second server, we can easily do that as well:

>>> make_server2 = asyncio.start_server(handle_tcp_echo, '')
>>> server2 = run_in_foreground(make_server2)
>>> server2.sockets[0]
<socket.socket fd=7, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('', 41200)>
>>> port2 = server2.sockets[0].getsockname()[1]

Now, both of these servers are configured to run directly in the main thread's event loop, so trying to talk to them using a synchronous client wouldn't work. The client would block the main thread, and the servers wouldn't be able to process incoming connections. That's where our asynchronous client coroutine comes in: if we use that to send messages to the server, then it doesn't block the main thread either, and both the client and server coroutines can process incoming events of interest. That gives the following results:

>>> print(run_in_foreground(tcp_echo_client('Hello World!', port)))
-> Client sending: 'Hello World!'
-> Server received 'Hello World!' from ('', 44386)
<- Server sending: 'Hello World!'
-- Terminating connection on server
<- Client received: 'Hello World!'
-- Terminating connection on client
Hello World!

Note something important here: you will get exactly that sequence of output messages, as this is all running in the interpreter's main thread, in a deterministic order. If the servers were running in their own threads, we wouldn't have that property (and reliably getting access to the port numbers the server components were assigned by the underlying operating system would also have been far more difficult).

And to demonstrate both servers are up and running:

>>> print(run_in_foreground(tcp_echo_client('Hello World!', port2)))
-> Client sending: 'Hello World!'
-> Server received 'Hello World!' from ('', 44419)
<- Server sending: 'Hello World!'
-- Terminating connection on server
<- Client received: 'Hello World!'
-- Terminating connection on client
Hello World!

That then raises an interesting question: how would we send messages to the two servers in parallel, while still only using a single thread to manage the client and server coroutines? For that, we'll need another of our helper functions from the previous post, schedule_coroutine:

def schedule_coroutine(target, *, loop=None):
    """Schedules target coroutine in the given event loop

    If not given, *loop* defaults to the current thread's event loop

    Returns the scheduled task.
    if asyncio.iscoroutine(target):
        return asyncio.ensure_future(target, loop=loop)
    raise TypeError("target must be a coroutine, "
                    "not {!r}".format(type(target)))

Update: As with the previous post, this post originally suggested a combined "run_in_background" helper function that handled both scheduling coroutines and calling arbitrary callables in a background thread or process. On further reflection, I decided that was unhelpfully conflating two different concepts, so I replaced it with separate "schedule_coroutine" and "call_in_background" helpers

First, we set up the two client operations we want to run in parallel:

>>> echo1 = schedule_coroutine(tcp_echo_client('Hello World!', port))
>>> echo2 = schedule_coroutine(tcp_echo_client('Hello World!', port2))

Then we use the asyncio.wait function in combination with run_in_foreground to run the event loop until both operations are complete:

>>> run_in_foreground(asyncio.wait([echo1, echo2]))
-> Client sending: 'Hello World!'
-> Client sending: 'Hello World!'
-> Server received 'Hello World!' from ('', 44461)
<- Server sending: 'Hello World!'
-- Terminating connection on server
-> Server received 'Hello World!' from ('', 44462)
<- Server sending: 'Hello World!'
-- Terminating connection on server
<- Client received: 'Hello World!'
-- Terminating connection on client
<- Client received: 'Hello World!'
-- Terminating connection on client
({<Task finished coro=<tcp_echo_client() done, defined at <stdin>:1> result='Hello World!'>, <Task finished coro=<tcp_echo_client() done, defined at <stdin>:1> result='Hello World!'>}, set())

And finally, we retrieve our results using the result method of the task objects returned by schedule_coroutine:

>>> echo1.result()
'Hello World!'
>>> echo2.result()
'Hello World!'

We can set up as many concurrent background tasks as we like, and then use asyncio.wait as the foreground task to wait for them all to complete.

But what if we had an existing blocking client function that we wanted or needed to use (e.g. we're using an asyncio server to test a synchronous client API). To handle that case, we use our third helper function from the previous post:

def call_in_background(target, *, loop=None, executor=None):
    """Schedules and starts target callable as a background task

    If not given, *loop* defaults to the current thread's event loop
    If not given, *executor* defaults to the loop's default executor

    Returns the scheduled task.
    if loop is None:
        loop = asyncio.get_event_loop()
    if callable(target):
        return loop.run_in_executor(executor, target)
    raise TypeError("target must be a callable, "
                    "not {!r}".format(type(target)))

To explore this, we'll need a blocking client, which we can build based on Python's existing socket programming HOWTO guide:

import socket
def tcp_echo_client_sync(message, port):
    conn = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    print('-> Client connecting to port: %r' % port)
    conn.connect(('', port))
    print('-> Client sending: %r' % message)
    data = conn.recv(100).decode()
    print('<- Client received: %r' % data)
    print('-- Terminating connection on client')
    return data

We can then use functools.partial in combination with call_in_background to start client requests in multiple operating system level threads:

>>> query_server = partial(tcp_echo_client_sync, "Hello World!", port)
>>> query_server2 = partial(tcp_echo_client_sync, "Hello World!", port2)
>>> bg_call = call_in_background(query_server)
-> Client connecting to port: 35876
-> Client sending: 'Hello World!'
>>> bg_call2 = call_in_background(query_server2)
-> Client connecting to port: 41672
-> Client sending: 'Hello World!'

Here we see that, unlike our coroutine clients, the synchronous clients have started running immediately in a separate thread. However, because the event loop isn't currently running in the main thread, they've blocked waiting for a response from the TCP echo servers. As with the coroutine clients, we address that by running the event loop in the main thread until our clients have both received responses:

>>> run_in_foreground(asyncio.wait([bg_call, bg_call2]))
-> Server received 'Hello World!' from ('', 52585)
<- Server sending: 'Hello World!'
-- Terminating connection on server
-> Server received 'Hello World!' from ('', 34399)
<- Server sending: 'Hello World!'
<- Client received: 'Hello World!'
-- Terminating connection on server
-- Terminating connection on client
<- Client received: 'Hello World!'
-- Terminating connection on client
({<Future finished result='Hello World!'>, <Future finished result='Hello World!'>}, set())
>>> bg_call.result()
'Hello World!'
>>> bg_call2.result()
'Hello World!'

Background tasks in Python 3.5

One of the recurring questions with asyncio is "How do I execute one or two operations asynchronously in an otherwise synchronous application?"

Say, for example, I have the following code:

>>> import itertools, time
>>> def ticker():
...     for i in itertools.count():
...         print(i)
...         time.sleep(1)
>>> ticker()
^CTraceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "<stdin>", line 4, in ticker

With the native coroutine syntax coming in Python 3.5, I can change that synchronous code into event-driven asynchronous code easily enough:

import asyncio, itertools
async def ticker():
    for i in itertools.count():
        await asyncio.sleep(1)

But how do I arrange for that ticker to start running in the background? What's the coroutine equivalent of appending & to a shell command?

It turns out it looks something like this:

import asyncio
def schedule_coroutine(target, *, loop=None):
    """Schedules target coroutine in the given event loop

    If not given, *loop* defaults to the current thread's event loop

    Returns the scheduled task.
    if asyncio.iscoroutine(target):
        return asyncio.ensure_future(target, loop=loop)
    raise TypeError("target must be a coroutine, "
                    "not {!r}".format(type(target)))

Update: This post originally suggested a combined "run_in_background" helper function that handle both scheduling coroutines and calling arbitrary callables in a background thread or process. On further reflection, I decided that was unhelpfully conflating two different concepts, so I replaced it with separate "schedule_coroutine" and "call_in_background" helpers

So now I can do:

>>> import itertools
>>> async def ticker():
...     for i in itertools.count():
...         print(i)
...         await asyncio.sleep(1)
>>> ticker1 = schedule_coroutine(ticker())
>>> ticker1
<Task pending coro=<ticker() running at <stdin>:1>>

But how do I run that for a while? The event loop won't run unless the current thread starts it running and either stops when a particular event occurs, or when explicitly stopped. Another helper function covers that:

def run_in_foreground(task, *, loop=None):
    """Runs event loop in current thread until the given task completes

    Returns the result of the task.
    For more complex conditions, combine with asyncio.wait()
    To include a timeout, combine with asyncio.wait_for()
    if loop is None:
        loop = asyncio.get_event_loop()
    return loop.run_until_complete(asyncio.ensure_future(task, loop=loop))

And then I can do:

>>> run_in_foreground(asyncio.sleep(5))

Here we can see the background task running while we wait for the foreground task to complete. And if I do it again with a different timeout:

>>> run_in_foreground(asyncio.sleep(3))

We see that the background task picked up again right where it left off the first time.

We can also single step the event loop with a zero second sleep (the ticks reflect the fact there was more than a second delay between running each command):

>>> run_in_foreground(asyncio.sleep(0))
>>> run_in_foreground(asyncio.sleep(0))

And start a second ticker to run concurrently with the first one:

>>> ticker2 = schedule_coroutine(ticker())
>>> ticker2
<Task pending coro=<ticker() running at <stdin>:1>>
>>> run_in_foreground(asyncio.sleep(0))

The asynchronous tickers will happily hang around in the background, ready to resume operation whenever I give them the opportunity. If I decide I want to stop one of them, I can cancel the corresponding task:

>>> ticker1.cancel()
>>> run_in_foreground(asyncio.sleep(0))
>>> ticker2.cancel()
>>> run_in_foreground(asyncio.sleep(0))

But what about our original synchronous ticker? Can I run that as a background task? It turns out I can, with the aid of another helper function:

def call_in_background(target, *, loop=None, executor=None):
    """Schedules and starts target callable as a background task

    If not given, *loop* defaults to the current thread's event loop
    If not given, *executor* defaults to the loop's default executor

    Returns the scheduled task.
    if loop is None:
        loop = asyncio.get_event_loop()
    if callable(target):
        return loop.run_in_executor(executor, target)
    raise TypeError("target must be a callable, "
                    "not {!r}".format(type(target)))

However, I haven't figured out how to reliably cancel a task running in a separate thread or process, so for demonstration purposes, we'll define a variant of the synchronous version that stops automatically after 5 ticks rather than ticking indefinitely:

import itertools, time
def tick_5_sync():
    for i in range(5):

The key difference between scheduling a callable in a background thread and scheduling a coroutine in the current thread, is that the callable will start executing immediately, rather than waiting for the current thread to run the event loop:

>>> threaded_ticker = call_in_background(tick_5_sync); print("Starts immediately!")
Starts immediately!
>>> 1

That's both a strength (as you can run multiple blocking IO operations in parallel), but also a significant weakness - one of the benefits of explicit coroutines is their predictability, as you know none of them will start doing anything until you start running the event loop.

Inaugural PyCon Australia Education Miniconf

PyCon Australia launched its Call for Papers just over a month ago, and it closes in a little over a week on Friday the 8th of May.

A new addition to PyCon Australia this year, and one I'm particularly excited about co-organising following Dr James Curran's "Python for Every Child in Australia" keynote last year, is the inaugural Python in Education miniconf as a 4th specialist track on the Friday of the conference, before we move into the main program over the weekend.

From the CFP announcement: "The Python in Education Miniconf aims to bring together community workshop organisers, professional Python instructors and professional educators across primary, secondary and tertiary levels to share their experiences and requirements, and identify areas of potential collaboration with each other and also with the broader Python community."

If that sounds like you, then I'd love to invite you to head over to the conference website and make your submission to the Call for Papers!

This year, all 4 miniconfs (Education, Science & Data Analysis, OpenStack and DjangoCon AU) are running our calls for proposals as part of the main conference CFP - every proposal submitted will be considered for both the main conference and the miniconfs.

I'm also pleased to announce two pre-arranged sessions at the Education Miniconf:

I'm genuinely looking forward to chairing this event, as I see tremendous potential in forging stronger connections between Australian educators (both formal and informal) and the broader Python and open source communities.

Accessing TrueCrypt Encrypted Files on Fedora 22

I recently got a new ultrabook (a HP Spectre 360), which means I finally have enough space to transfer my music files from the external drive where they've been stored for the past few years back to the laptop (there really wasn't enough space for them on my previous laptop, a first generation ASUS Zenbook, but even with the Windows partition still around, the extra storage space on the new device leaves plenty of room for my music collection).

Just one small problem: the bulk of the storage on that drive was in a TrueCrypt encrypted file, and the Dolphin file browser in KDE doesn't support mounting those as volumes through the GUI (at least, it doesn't as far as I could see).

So, off to the command line we go. While TrueCrypt itself isn't readily available for Fedora due to problems with its licensing terms, the standard cryptsetup utility supports accessing existing TrueCrypt volumes, and the tcplay package also supports creation of new volumes.

In my case, I just wanted to read the music files, so it turns out that cryptsetup was all I needed, but I didn't figure that out until after I'd already installed tcplay as well.

For both cryptsetup and tcplay, one of the things you need to set up in order to access a TrueCrypt encrypted file (as opposed to a fully encrypted volume) is a loopback device - these let you map a filesystem block device back to a file living on another filesystem. The examples in the tcplay manual page (man tcplay) indicated the command I needed to set that up was losetup.

However, the losetup instructions gave me trouble, as they appeared to be telling me I didn't have any loopback devices:

[ncoghlan@thechalk ~]$ losetup -f
losetup: cannot find an unused loop device: No such file or directory

Searching on Google for "fedora create a loop device" brought me to this Unix & Linux Stack Exchange question as the first result, but the answer there struck me as being far too low level to be reasonable as a prerequisite for accessing encrypted files as volumes.

So I scanned further down through the list of search results, with this Fedora bug report about difficulty accessing TrueCrypt volumes catching my eye. As with the Stack Overflow answer, most of the comments there seemed to be about reverting the effect of a change to Fedora's default behaviour a change which meant that Fedora no longer came with any loop devices preconfigured.

However, looking more closely at Kay's original request to trim back the list of default devices revealed an interesting statement: "Loop devices can and should be created on-demand, and only when needed, losetup has been updated since Fedora 17 to do that just fine."

That didn't match my own experience with the losetup command, so I wondered what might be going on to explain the discrepancy, which is when it occurred to me that running losetup with root access might solve the problem. Generally speaking, ordinary users aren't going to have the permissions needed to create new devices, and I'd been running the losetup command using my normal user permissions rather than running it as root. That was a fairly straightforward theory to test, and sure enough, that worked:

[ncoghlan@thechalk ~]$ sudo losetup -f

Armed with my new loop device, I was then able to open the TrueCrypt encrypted file on the external GoFlex drive as a decrypted volume:

[ncoghlan@thechalk ~]$ sudo cryptsetup open --type tcrypt /dev/loop0 flexdecrypted

Actually supplying the password to decrypt the volume wasn't a problem, as I use a password manager to limit the number of passwords I need to actually remember, while still being able to use strong passwords for different services and devices.

However, even with my music files in the process of copying over to my laptop, this all still seemed a bit cryptic to me, even for the Linux command line. It would have saved me a lot of time if I'd been nudged in the direction of "sudo losetup -f" much sooner, rather than having to decide to ignore some bad advice I found on the internet and instead figure out a better answer by way of the Fedora issue tracker.

So I took four additional steps:

  • First, I filed a new issue against losetup, suggesting that it nudge the user in the direction of running it with root privileges if they first run it as a normal user and don't find any devices
  • Secondly, I followed up on the previous issue I had found in order to explain my findings
  • Thirdly, I added a new answer to the Stack Exchange question I had found, suggesting the use of the higher level losetup command over the lower level mknod command
  • Finally, I wrote this post recounting the tale of figuring this out from a combination of local system manual pages and online searches

Adding a right-click option to Dolphin to be able to automatically mount TrueCrypt encrypted files as volumes and open them would be an even nicer solution, but also a whole lot more work. The only actual change suggested in my above set of additional steps is tweaking a particular error message in one particular situation, which should be far more attainable than a new Dolphin feature or addon.

Stop Supporting Python 2.6 (For Free)

(Note: I'm speaking with my "CPython core developer" hat on in this article, rather than my "Red Hat employee" one, although it's the latter role that gave me broad visibility into the Fedora/RHEL/CentOS Python ecosystem)

Alex Gaynor recently raised some significant concerns in relation to his perception that Red Hat expects the upstream community to support our long term support releases for as long as we do, only without getting paid for it.

That's not true, so I'm going to say it explicitly: if you're currently supporting Python 2.6 for free because folks using RHEL 6 or one of its derivatives say they need it, and this is proving to be a hassle for you, then stop. If they complain, then point them at this post, as providing an easily linkable reference for that purpose is one of the main reasons I'm writing it. If they still don't like it, then you may want to further suggest that they come argue with me about it, and leave you alone.

The affected users have more options than they may realise, and upstream open source developers shouldn't feel obliged to donate their own time to help end users cope with organisations that aren't yet able to upgrade their internal infrastructure in a more timely fashion.

Red Hat Supported Python Upgrade Paths

Since September 2013, Red Hat Enterprise Linux subscriptions have included access to an additional component called Red Hat Software Collections. You can think of Software Collections roughly as "virtualenv for the system package manager", providing access to newer language runtimes (including Python 2.7 and 3.3), database runtimes, and web server runtimes, all without interfering with the versions of those integrated with (and used by) the operating system layer itself.

This model (and the fact they're included with the base Red Hat Enterprise Linux subscription) means that Red Hat subscribers are able to install and use these newer runtimes without needing to upgrade the entire operating system.

Since June 2014, Red Hat Enterprise Linux 7 has also been available, including an upgrade of the system Python to Python 2.7. The latest release of that is Red Hat Enterprise Linux 7.1.

As Red Hat subscriptions all include free upgrades to new releases, the main barrier to deployment of these newer Python releases is institutional inertia. While it's entirely admirable that many upstream developers are generous enough to help their end users work around this inertia, in the long run doing so is detrimental for everyone concerned, as long term sustaining engineering for old releases is genuinely demotivating for upstream developers (it's a good job, but a lousy way to spend your free time), and for end users, working around institutional inertia this way reduces the pressure to actually get the situation addressed properly.

Beyond Red Hat Enterprise Linux, Red Hat's portfolio also includes both the OpenShift Platform-as-a-Service offering and the Feed Henry Mobile Application Platform. For many organisations looking to adopt an iterative approach to web service development, those are going to be a better fit than deploying directly to Red Hat Enterprise Linux and building a custom web service management system around that.

Third Party Supported Python Upgrade Paths

The current Red Hat supported upgrade paths require administrative access to a system. While it's aimed primarily at scientific users, the comprehensive Anaconda Python distribution from Continuum Analytics is a good way to obtain prebuilt versions of Python for Red Hat Enterprise Linux that can be installed by individual users without administrative access.

At Strata 2015, Continuum's Python distribution not only featured in a combined announcement regarding on-premise deployment of Anaconda Cluster together with Red Hat Storage, but also in Microsoft's announcement of Python support in the Azure Machine Learning service

For users that don't need a full scientific Python distribution, Continuum Analytics also offer miniconda which just provides a Python runtime and the conda package manager, providing end users with a cross-platform way to obtain and manage multiple Python runtimes without needing administrative access to their systems.

Community Supported Python Upgrade Paths

The question of providing upgrade paths for folks without an active Red Hat subscription has also been taken into consideration.

In January 2014 Red Hat became an official sponsor of the long established CentOS project, with the aim of providing a stable base for community open source innovation above the operating system layer (this aim contrasts with the aims of the Fedora project, which is intended primarily to drive innovation within the operating system layer itself).

CentOS 7 was originally released in July 2014, and the latest release, CentOS 7(1503), was just published (as the name suggests) in March 2015).

For CentOS and other RHEL derivatives, the upstream project for Red Hat Software Collections is hosted at softwarecollections.org, making these collections available to the whole Fedora/RHEL/CentOS ecosystem, rather than only being available to Red Hat subscribers.

Folks running Fedora Rawhide that are particularly keen to be on the cutting edge of Python can even obtain prerelease Python 3.5 nightly builds as a software collection from Miro Hrončok's Fedora COPR repository.

In addition to maintaining Fedora itself, the Fedora community also maintains the Extra Packages for Enterprise Linux (EPEL) repositories, providing ready access to non-conflicting packages beyond those in the set included in the base Red Hat Enterprise Linx and CentOS releases.

Providing Commercial Red Hat Enterprise Linux Support

If projects are regularly receiving requests for support on Red Hat Enterprise Linux and derived platforms, and the developers involved are actively looking to build a sustainable business around their software, then a steady stream of these requests may represent an opportunity worth exploring. After all, Red Hat's subscribers all appreciate the value that deploying commercially supported open source software can bring to an organisation, and are necessarily familiar with the use of software subscriptions to support sustaining engineering and the ongoing development of new features for the open source software that they deploy.

One of the key things that many customers are looking for is pre-release integration testing to provide some level of assurance that the software they deploy will work in their environment, while another is a secure development and distribution pipeline that ensures that the software they install is coming from organisations that they trust.

One of Red Hat's essential tools for managing this distributed integration testing and content assurance effort is its Partner Program. This program is designed to not only assist Red Hat subscribers in finding supported software that meets their needs, but also to provide Red Hat, partners, and customers with confidence that the components of deployed solutions will work well together in target deployment environments.

Specifically for web service developers that would like to provide a supported on-premise offering, last year's announcement of a Container Certification Program (with Red Hat Enterprise Linux 7 and the OpenShift Platform-as-a-Service offering as certified container hosts) extended Red Hat's certification programs to cover the certification of Docker containers in addition to other forms of Linux application deployment.

Even more recently, the Red Hat Container Development Kit was introduced to help streamline that certification process for Red Hat Independent Software Vendor Partners.

But these are all things that folks should only explore if they're specifically interested in building a commercial support business around their software. If users are trying to get long term maintenance support for a community project for free, then upstream developers should be sending a single unified message in response: don't assume you'll be able to run new versions of open source software on old platforms unless you're specifically paying someone to ensure that happens.

I'm not saying this because I work for a platform vendor that gets paid (at least in part) to do this, I'm saying it because most open source projects are maintained by innovators that are upgrading their technology stacks regularly, where versions of components are already old after 2 years and truly ancient after 5. Expecting open source innovators to provide long term maintenance for free is simply unreasonable - regularly upgrading their own stacks means they don't need this long term platform support for themselves, so the folks that are seeking it should be expected to pay for it to happen. (Apparent exceptions like CentOS aren't exceptions at all: sustaining engineering on CentOS is instead a beneficial community byproduct of the paid sustaining engineering that goes into Red Hat Enterprise Linux).

Abusing Contributors is not OK

As reported in Ars Technica, the ongoing efforts to promote diversity in open source communities came up once more during the plenary Q&A session with Linus Torvalds, Andrew Tridgell, Bdale Garbee and Rusty Russell.

I was there for that session, and found that Linus's response appeared to betray a fundamental misunderstanding of the motives of many of the folks pushing for increased diversity in the open source community, as well as a lack of awareness of the terrible situations that can arise when leaders in a community regularly demonstrate abusive behaviour without suffering any significant consequences (just ask folks like Kathy Sierra, Zoe Quinn, Anita Sarkeesian and Brianna Wu that have been subjected to sustained campaigns of harassment largely for being women that dared to have and express an opinion on the internet).

As the coordinator of the Python Software Foundation's contribution to the linux.conf.au 2015 financial assistance program, and as someone with a deep personal interest in the overall success of the open source community, I feel it is important for me to state explicitly that I consider Linus's level of ignorance around appropriate standards of community conduct to be unacceptable in an open source community leader in 2015.

Linus's defence of his abusive behaviour is that he's "not nice", and "doesn't care about you". He does care deeply about his project, though, and claims to be motivated primarily by wanting that to continue to be successful.

To be completely honest, the momentum behind the Linux juggernaut is now large enough that Linus could likely decide to chuck it all in and spend the rest of his life on a beach sipping cocktails without worrying about open source politics, and people would figure out a way to ensure that Linux continued to thrive and grow without him. Many a successful start-up has made that transition when the founders leave, and there's no good reason to believe an open source community would be fundamentally different in that regard. The transition to a new leadership structure might be a little messy, but the community would almost certainly figure it out.

However, there's still a lot of scope for Linus to influence how fast Linux grows, and on that front his words and actions suggest that he considers being careless in his speech, without regard for the collateral damage his verbal broadsides may be doing to his cause, more important than having the most positive impact he is capable of having on the future growth of the Linux kernel development project and the open source community at large.

It's not (necessarily) about being nice

It may surprise some folks to learn that I don't consider myself a nice human either. My temper is formidable (I just keep it under control most of the time, a task online communication makes easier by providing the ability to walk away from the computer for a while), and any feelings of compassion I have for others are more a product of years of deliberate practice and hanging around with compassionate people than they are any particularly strong innate knack for empathy.

I'm pretty sure that genuinely nice people do exist, and I assume that one of their key motives for creating open, welcoming, inclusive communities is because it's fundamentally the right thing to do. The main reason I exclude myself from my assumed category of "nice people" is that, while I acknowledge that motivation intellectually, it's not really one that I feel viscerally.

Instead, what I do care about, passionately, is helping the best ideas win (where I include "feasible" as part of my definition of "best"). Not the "best ideas from people willing to tolerate extensive personal abuse". The best ideas anyone is willing to share with me, period. And I won't hear those ideas unless I help create environments where all participants are willing to speak up, not just those that are prepared to accept a blistering verbal barrage from a powerful authority figure as a possible consequence of attempting to participate. Those are upsetting enough when they come from random strangers on the internet, when they come from someone with enormous influence not only over you and your future career, but also your entire industry, they can be devastating.

The second order consequences

So there you have it, my not-nice reason for advocating for more welcoming and inclusive open source communities: because, from an engineering standpoint, I consider "has a high level of tolerance for receiving personal abuse from community leaders" to be an extraordinarily stupid filter to apply to your pool of potential contributors.

Exhibiting abusive behaviour as a leader has additional consequences though, and they can be even more problematic: by regularly demonstrating abusive behaviour yourself, you end up normalising harassment within your community in general, both in public and in private.

I believe Linus when he says he doesn't care about who people are or where they're from, only their contributions. I'm the same way - until I've known them for a while, I tend to care about contributors and potential contributors wholesale (i.e. happy people that enjoy the environment they're participating in tend to spend more time engaged, learn faster, produce superior contributions, and more quickly reach the point of being able to contribute independently), rather than retail (i.e. I care about my friends because they're my friends, regardless of context).

But when you're personally abusive as a leader, you also have to take a high level of responsibility for all the folks that look up to you as a role model, and act out the same behaviours you exhibit in public. When you reach this point, the preconditions for participation in your community now include:

  • Willing to tolerate public personal abuse from project leaders
  • Willing to tolerate public personal abuse from the community at large
  • Willing to tolerate personal abuse in private

With clauses like that as part of the definition, the claim of "meritocracy" starts to feel a little shaky, doesn't it? Meritocracy is a fine ideal to strive for, but claiming to have achieved it when you're imposing irrelevant requirements like this is arrogant nonsense.

We're not done yet, though, as this culture of abuse then combines with elitism based on previously acquired knowledge to make it normal to abuse newcomers for still being in the process of learning. I find it hard to conceive of a more effective approach to keeping people from adopting something you're giving away for free than tolerating a community that publicly abuses people for not magically already knowing how to use technology that they may have never even heard of before.

As a result of this perspective, the only time I'll endeavour to eject anyone from a community where I have significant influence is when they're actively creating an unpleasant environment for other participants, and demonstrate no remorse whatsoever regarding the negative impact their actions are having on the overall collaborative effort. I count myself incredibly fortunate to have only had to do this a few times in my life so far, but it's something I believe in strongly enough for it to have been the basis for once deciding to resign from a position paying a six-figure salary at a company I otherwise loved working for. To that company's credit, the abusive leader was let go not long afterwards, but the whole secretive corporate system is rigged such that these toxic "leaders" can usually quickly find new positions elsewhere and hence new subordinates to make miserable - the fact that I'm not willing to name names here for fear of the professional consequences is just one more example of how the system is largely set up to protect abusive leaders rather than holding them to account for the impact their actions have on others.

Ideas and code are still fair game

One of the spurious fears raised against the apparently radical notion of refusing to tolerate personal abuse in a collaborative environment is that adopting civil communication practices somehow means that bad code must then be accepted into the project.

Eliminating personal abuse doesn't mean eliminating rigorous critique of code and ideas. It just means making sure that you are critiquing the code and the ideas, rather than tearing down the person contributing them. It's the difference between "This code isn't any good, here are the problems with it, I'm confident you can do better on your next attempt" (last part optional but usually beneficial when it comes to growing your contributor community) and "This code is terrible, how dare you befoul my presence with it, begone from my sight, worm!".

The latter response may be a funny joke if done in private between close friends, but when it's done in public, in front of a large number of onlookers who don't know either the sender or the recipient personally, it sets an astoundingly bad example as to what a mutually beneficial collaborative peer relationship should look like.

And if you don't have the self-discipline needed to cope with the changing context of our online interactions in the open source community? Well, perhaps you don't yet have the temperament needed to be an open source leader on an internet that is no longer the sole preserve of those of us that are more interested in computers than we are in people. Most of the technical and business press have yet to figure out that they can actually do a bit of investigative journalism to see how well vendor rhetoric aligns with upstream open source engineering activity (frequency of publication is still a far more important performance metric for most journalists than questioning the spin served up in corporate press releases), so the number of folks peering into the open source development fishbowl is only going to grow over time.

It isn't that hard to learn the necessary self-control, though. It's mostly just a matter of taking the time to read each email or code review comment, look for the parts that are about the contributor rather than the code or the design, and remove them before hitting send. And if that means there's nothing left? Then what you were about to send was pure noise, adding nothing useful to the conversation, and hence best left unsent. Doing anything less than this as a community leader is pure self-indulgence, putting your own unwillingness to consider the consequences of your communications ahead of the long term interests of your project. We're only talking about software here, after all - lives aren't on the line when we're deciding how to respond to a particular contribution, so we can afford to take a few moments to review the messages we're about to send and consider how they're likely to be perceived, both by the recipient, and by everyone else observing the exchange.

With any personal abuse removed, you can be as ruthless about critiquing the code and design as you like. Learning not to take critiques of your work personally is a necessary skill to acquire if your ambition is to become a high profile open source developer - the compromises often necessary in the real world of software design and development mean that you will end up shipping things that can legitimately be described as terrible, and you're going to have to learn to be able to say "Yes, I know it's terrible, for reasons X, Y, and Z, and I decided to publish it anyway. If you don't like it, don't use it.". (I highly recommend giving talks about these areas you know are terrible - they're fun to prepare, fun to give, and it's quite entertaining seeing the horrified reactions when people realise I'm not kidding when I say all software is terrible and computers don't actually work, they just fake it fairly well through an ongoing series of horrible hacks built atop other horrible hacks. I'm not surprised the Internet breaks sometimes - given the decades of accumulated legacy hardware and software we're building on and working around, it's thoroughly astonishing that anything technology related ever works at all)

But no matter how harsh your technical critiques get, never forget that there's at least one other human on the far end of that code review or email thread. Even if you don't personally care about them, do you really think it's a good idea to go through life providing large numbers of people with public evidence of why you are a thoroughly unpleasant person to be forced to deal with? As a project leader, do you really think you're going to attract the best and brightest people, who are often free to spend their time and energy however they like, if you set up a sign saying "You must be willing to tolerate extensive personal abuse in order to participate here"?

What can we do about it?

First, and foremost, for those of us that are paid open source community leaders, we can recognise that understanding and motivating our contributors and potential contributors in order to grow our communities is part of our job. If we don't like that, if we'd prefer to be able to "just focus on the code", to the degree where we're not willing to learn how to moderate our own behaviour in accordance with our level of responsibility, then we need to figure out how to reorganise things such that there's someone with better people management and communication skills in a position to act as a buffer between us and our respective communities.

If we instead decide we need to better educate ourselves, then there are plenty of resources available for us to do so. For folks just beginning to explore questions of systemic bias and defaulting to exclusivity, gender-based bias is a good one to start with, by perusing resources like the Feminism 101 section on the Geek Feminism wiki, or (if we have the opportunity) attending an Ada Initiative Ally Skills workshop.

And if we do acknowledge the importance of this work, then we can use our influence to help it continue, whether that's by sponsoring educational workshops, supporting financial assistance programs, ensuring suitable codes of conduct are in place for our events and online communities, supporting programs like the GNOME Outreach Program for Women, or organisations like the Ada Initiative, and so on, and so forth.

For those of us that aren't community leaders, then one of the most effective things we can do is vote with our feet: at last count, there are over a million open source projects in existence, many of them are run in such a way that participating in them is almost always a sheer pleasure, and if no existing community grabs your interest, you always have the option of starting your own.

Personal enjoyment is only one reason for participating in open source though, and professional obligations or personal needs may bring us into contact with project leaders and contributors that currently consider personal abuse to be an acceptable way of interacting with their peers in a collaborative context. If leaving isn't a viable option, then what can we do?

Firstly, the options I suggest above for community leaders are actually good options for any participants in the open source community that view the overall growth and success of the free and open source software ethos as being more important than any one individual's personal pride or reluctance to educate themselves about issues that don't affect them personally.

Secondly, we can hold our leaders to account. When community leaders give presentations at community events, especially when presenting on community management topics, feel free to ask the following questions (or variations on these themes):

  • Are we as community leaders aware of the impact current and historical structural inequalities have on the demographics of our community?
  • What have we done recently as individuals to improve our understanding of these issues and their consequences?
  • What have we done recently as community leaders to attempt to counter the systemic biases adversely affecting the demographics of our communities?

These are questions that open source community leaders should be able to answer. When we can't, I guess the silver lining is that it means we have plenty of scope to get better at what we do. For members of vulnerable groups, an inability for leaders to answer these questions is also a strong sign as to which communities may not yet be able to provide safe spaces for you to participate without experiencing harassment over your identity rather than being critiqued solely based on the quality of your work.

If you ask these questions, you will get people complaining about bringing politics into a technical environment. The folks complaining are simply wrong, as the single most important factor driving the quality of our technology is the quality of our thinking. Assuming we have attained meritocracy (aka "the most effective collaborative environment possible") is sheer foolishness, when a wide array of systemic biases remain in place that work to reduce the depth, breadth, and hence quality, of our collective thinking.

Update 22 Jan, 2015: Minor typo and grammar fixes

DTCA Public Consultation - Brisbane

Over the weekend, Asher Wolf alerted me (and many others in the open source and cryptographic communities) to the Australian Defence Trade Controls Act 2012, and the current public consultation taking place around a bill proposing amendments to that act.

Being heavily involved in improving the security of open source infrastructure like the Python Package Index and the Python 2 reference interpreter, working at a multinational open source vendor, and having an extensive background in working under the constraints of the US International Traffic in Arms regulations, Asher's concern caught my attention, since bad legislation in this area can have significant chilling effects on legitimate research and development activities.

As a result, I've escalated this legislation for review by the legal teams at various open source companies and organisations, with a view to making formal submissions to the public consultation process that is open until January 30th (ready for bills to be submitted for consideration to federal parliament on February 23rd).

However, I was also able to attend the first public consultation session held at the University of Queensland on January 19, so these are my impressions based primarily on that sessions and my own experiences dealing with ITAR. I'm not a lawyer and I haven't actually read the legislation, so I'm not going to pick up on any drafting errors, but I can at least speak to the intent of the folks involved in moving this process forward.

What not to worry about

To folks encountering this kind of legislation for the first time, the sheer scope of the Defence and Strategic Goods List can seem absolutely terrifying. This was very clear to me from some of the questions various academics in the room were asking.

On this particular point, I can only say: "Don't panic". This isn't a unique-to-Australia list, it's backed by a treaty called the Wassenaar Arrangement - the DSGL represents part of the implementation of that arrangement into Australian law.

When the laws implementing that arrangement are well drafted, everyone outside the military industrial complex (and certain easily weaponised areas of scientific research) can pretty much ignore them, while everyone inside the military industrial complex (and the affected areas of research) pays very close attention to them because we like not being in jail (and because gunrunning is bad, and bioterrorism is worse, mmm'kay?).

A heavily regulated military supply chain is already scary enough, we really don't want to see the likely consequences of an unregulated one. (And if you're tempted to make a snarky comment about the latter already being the case, no, it really isn't. While folks can sometimes use overclassification to avoid regulations they're supposed to be following, that still introduces significant friction and inefficiencies into whatever they're doing. It's not as good as people actually respecting the laws of the countries they're supposedly defending, including genuinely meeting the requirement for civilian authority over the military, but it's still a hell of a lot better than nothing).

Getting back on topic, the US ITAR and crypto export control laws are currently considered the most strict implementation of the Wassenaar Arrangement amongst the participating nations (going beyond the requirements of the treaty in several areas), so if you see plenty of US nationals participating in an activity without being fined and going to jail, you can be fairly confident that it isn't actually a controlled activity under the DSGL (or, even if it is, permits for that specific activity will be fairly easy to get, and the most likely consequence of not realising you need a permit for something you're doing will be someone from your government getting in touch to point out that you should apply for one).

There are certainly some very questionable aspects of this list (with the perennial "favourite" being the fact the Wassenaar Arrangement does, in fact, attempt to regulate the global trade in mathematics, which is just as stupid and problematic as it sounds), but it's a known quantity, and one we're pretty sure we can continue to live with (at least for the time being).

What to worry about

The real problem here is that the regulations included in the 2012 Act are not well drafted, and the legislated 2 year transition period from May 2013 through to May 2015 prior to the enforcement provisions kicking in is about to run out.

The biggest problem with the 2012 act is that in trying to keep things simple (essentially, "if its on the DSGL, you need a permit"), it ended up becoming extraordinarily draconian, requiring a permit for things that don't require an export license even under ITAR.

For the general public, the most significant shift in the 2015 amendment bill is the fact that several cases around open publication of information related to dual-use technologies shift to being allowed by default, and only in exceptional cases would a permit be required (and in those cases, the onus would be on the government to inform the covered individuals of that requirement).

The amendments also include a variety of additional exemptions for little things like making it legal for Australian's own police and security agencies to collaborate with their international counterparts. (Snarky comment opportunity #2: in certain areas, making such collaboration illegal seems like a potentially attractive idea...)

That 2 year pilot was included in the original legislation as a safety mechanism, the feedback from the associated steering group has been extensive, and if things had gone according to plan, the relevant amendments to the bill would have been passed last year in the spring sitting of federal parliament, leaving DECO with at least 6 months to educate affected organisations and individuals, and start issuing the now necessary permits before the enforcement provisions became active in May. Unfortunately, we currently have a federal government that views pushing a particular ideological agenda as being more important than actually doing their job, so we're now faced with the prospect of regulations that industry doesn't want, academia doesn't want, the Australian public service don't want, and the Australian military don't want, coming into effect anyway.

Isn't politics fun?

What DECO are (trying) to do about it

The group tasked with untangling this particular legislative Charlie Foxtrot is the Australian Defence Export Control Office (DECO). Their proposal for addressing the situation hinges on two bills that they plan to put before the next sitting of federal parliament:

  • an amendment bill for the Act itself, which fixes it to be a conventional implementation of the Wassenaar Arrangement, in line with existing implementations in other Wassenaar nations (why we didn't just do that in the first place is beyond me, but at least DECO are trying to fix the mistake now)
  • a second bill to delay the enactment of the enforcement provisions for a further six months to provide sufficient time for DECO to properly educate affected parties and start issuing permits

As far as I am aware, the second bill is needed primarily due to the consideration of the first bill slipping by six months, since we're now looking at the prospect of only having 4 weeks for DECO to start issuing permits before the enforcement provisions come into effect. Nobody involved thinks that's a good idea.

If both of those bills pass promptly, then the only cause for concern is whether or not there are any remaining devils in the details of the legislation itself. Member of the general public aren't going to be able to pick those up - despite the surface similarities, legalese isn't English, and reading it without interpreting it in the context of relevant case law can be a good way to get yourself into trouble. Summary translations from legalese to English by a competent lawyer are a much safer bet, although still not perfect. (For the programmers reading this: I personally find it useful to think of legalese as source code that runs on the language interpreter of a given nation's legal system, while the English translations are the code comments and documentation that anyone should be able to read if they understand the general concepts involved).

If at least the second bill passes, then we have another 6 months to work on a better resolution to the problem.

If neither bill passes, then DECO end up in a bad situation where they'll be required by law to implement and enforce regulations that they're convinced are a bad idea. They actually have everything in place to do that if they have to, but they don't want this outcome, and neither does anyone else.

What industry and academia can do about it

While it's very short notice, the main thing industry and academia can do is to file formal submissions with DECO as described in their overview of the public consultation process.

There are three main things to be addressed on that front:

  • ensuring federal parliament are aware of the importance of amending the Defence Trade Controls Act 2012 to eliminate the more draconian provisions
  • ensuring federal parliament are aware of the infeasibility of putting this into effect on the original timeline and the need for a significant delay in the introduction of the enforcement provisions
  • ensuring DECO are alerted to any remaining areas of concern in the specific drafting of the amended legislation (although I'd advise skipping this one if you're not a lawyer yourself - it's the functional equivalent of a lawyer with no training as a programmer proposing patches to the Linux kernel)

We were apparently asleep at the wheel when DTCA went through in 2012, so we owe a lot of thanks to whoever it was that advocated for and achieved the inclusion of the two year transition and consultation period in the original bill. Now we need to help ensure that our currently somewhat dysfunctional federal parliament doesn't keep us from receiving the benefit of that foresight.

What's definitely not going to happen

This consultation process is not the place to rail against the details of the Wassenaar Arrangement or Australia's participation in it. You won't achieve anything except to waste the time of folks that currently have a really serious problem to fix, and a very limited window in which to fix it.

Yes, Wassenaar has some serious problems, especially around its handling of cryptography and cryptographic research, but we have a fairly settled approach to handling that at this point in history. The critical concern in this current case is to help DECO ensure that the associated Australian regulations can be readily handled through the mechanisms that have already been put in place to handle existing Wassenaar enforcement regimes in other countries. With the way the 2012 Act was drafted, that's almost certainly currently not the case, but the proposed 2015 amendments should fix it (assuming the amendments actually have the effects that DECO has indicated they're intended to).

Running Kallithea on OpenShift

Kallithea for CPython

The CPython core development team are currently evaluating our options for modernising our core development workflows to better match the standards set by other projects and services like OpenStack and GitHub.

The first step in my own proposal for that is to migrate a number of the support repositories currently hosted using a basic Mercurial server on hg.python.org to an instance of Kallithea hosted as forge.python.org. (Kallithea is a GPLv3 Python project that was forked from RhodeCode after certain aspects of the latter's commercialisation efforts started alienating several members of their user and developer community)

Tymoteusz Jankowski (a contributor to Allegro Group's open source data centre inventory management system, Ralph), has already started looking at the steps that might be involved in integrating a Kallithea instance into the PSF's Salt based infrastructure automation.

However, for my proposal to be as successful as I would like it to be, I need the barriers to entry for the development and deployment of the upstream Kallithea project itself to be as low as possible. One of the challenges we've often had with gaining contributors to CPython infrastructure maintenance is the relatively high barriers to entry for trying out service changes and sharing them with others, so this time I plan to tackle that concern first, by ensuring that addressing it is a mandatory requirement in my proposal.

That means tackling two particular problems:

  • Having a way to easily run local test instances for development and experimentation
  • Having a way to easily share demonstration instances with others

For the first problem, I plan to rely on Vagrant and Docker, while for the second I'll be relying on the free tier in Red Hat's OpenShift Online service. Unfortunately, while the next generation of OpenShift will support Docker images natively, for the time being, I need to tackle these as two separate problems, as there aren't any existing Docker based services I'm aware of with a free tier that is similarly suited to the task of sharing development prototypes for open source web services with a broad audience (let alone any such services that are also fully open source).

Once I have these working to my satisfaction, I'll propose them to the Kallithea team for inclusion in the Kallithea developer documentation, but in the meantime I'll just document them here on the blog.

Enabling Kallithea deployment on OpenShift

My first priority is to get a public demonstration instance up and running that I can start tweaking towards the CPython core development community's needs (e.g. installing the custom repo hooks we run on hg.python.org), so I'm starting by figuring out the OpenShift setup needed to run public instances - the Vagrant/Docker based setup for local development will come later.

Conveniently, WorldLine previously created an OpenShift quickstart for RhodeCode and published it under the Apache License 2.0, so I was able to use that as a starting point for my own Kallithea quickstart.

While I personally prefer to run Python web services under mod_wsgi in order to take advantage of Apache's authentication & authorisation plugin ecosystem, that's not a significant concern for the demonstration server use case I have in mind here. There are also some other aspects in the WorldLine quickstart I'd like to understand better and potentially change (like figuring out a better way of installing git that doesn't involve hardcoding a particular version), but again, not a big deal for demonstration instances - rather than worrying about them too much, I just annotated them as TODO comments in the OpenShift hook source code.

I'd also prefer to be running under the official Python 2.7 cartridge rather than a DIY cartridge, but again, my focus at this point is on getting something up and running, and then iterating from there to improve it.

That meant adapting the quickstart from RhodeCode to Kallithea was mostly just a matter of changing the names of the various components being installed and invoked, together with changing the actual installation and upgrade steps to be based on Kallithea's deployment instructions.

The keys to this are the build hook and the start hook. The OpenShift docs have more details on the various available action hooks and when they're run.

In addition to the TODO comments noted above, I also added various comments explaining what different parts of the action hook scripts were doing.

(Note: I haven't actually tested an upgrade, only the initial deployment described below, so I can't be sure I have actually adapted the upgrade handling correctly yet)

Deploying my own Kallithea instance

I already have an OpenShift account, so I could skip that step, and just create a new app under my existing account. However, I didn't have the command line tools installed, so that was the first step in creating my own instance:

sudo yum install /usr/bin/rhc

yum is able to figure out on my behalf that it is rubygems-rhc that provides the command line tools for OpenShift in Fedora (alternatively, I could have looked that up myself in the OpenShift client tools installation docs).

The next step was to configure the command line tools to use my OpenShift Online account, generate a local login token for this machine, and upload my public SSH key to OpenShift Online. That process involved working through the interactive prompts in:

rhc setup

With those preliminary OpenShift steps out of the way, it was time to move on to deploying the application itself. It's worth noting that app creation automatically clones a local git repo named after the application, so I created a separate "app_repos" subdirectory in my development directory specifically so I could call my OpenShift app "kallithea" without conflicting with my local clone of the main kallithea repo.

As described in the quickstart README, the app creation command is:

rhc app create kallithea diy-0.1 postgresql-9.2

That churned away for a while, and then attempted to clone the app repo locally over ssh (with SSH putting up a prompt to accept the validity of the app's freshly generated SSH key). I'm not sure why, but for some reason that automatic clone operation didn't work for me. rhc put up a detailed message explaining that the app creation had worked, but the clone step had failed. Fortunately, as the troubleshooting notice suggested, a subsequent rhc git-clone kallithea worked as expected.

OpenShift provides a default app skeleton automatically, but I actually want to get rid of that and replace it with the contents of the quickstart repo:

rm -R diy .openshift misc README.md
git add .
git commit -m "Remove template files"
git remote add quickstart -m master https://github.com/ncoghlan/openshift-kallithea.git
git pull -s recursive -X theirs quickstart master

The default merge commit message that popped up was fine, so I just accepted that and moved on to the most interesting step:

git push

Because this is the first build, there's a lot of output related to installing and building the PostgreSQL driver and git, before moving on to installing Kallithea and its dependencies.

However, that still didn't take long, and completed without errors, so I now have my own Kallithea instance up and running.

And no, the default admin credentials created by the quickstart won't work anymore - I immediately logged in to the admin account to change them!

Where to from here?

There are various aspects of the current quickstart that are far from ideal, but I don't plan to spend a lot of time worrying about it when I know that support for using Docker images directly in OpenShift is coming at some point in the not too distant future.

One of the key advantages of Docker is the much nicer approach it offers to layered application development where infrastructure experts can provide base images for others to build on, and in the case of deploying Python applications with mod_wsgi, that means listening to Graham Dumpleton (the author of mod_wsgi, currently working for New Relic).

On that front, Graham has actually been working on creating a set of Debian based mod_wsgi Docker images that Python developers can use, rather than having to build their own from scratch.

In my case, I'd really prefer something based on CentOS 7 or Fedora Cloud, but that's a relatively minor quibble, and Graham's images should still make a great basis for putting together a Vagrant+Docker based local workflow for folks working on Kallithea.

That, however, is a topic for a future post :)

Seven billion seconds per second

A couple of years ago, YouTube put together their "One hour per second" site, visualising the fact that for every second of time that elapses, an hour of video is uploaded to YouTube. Their current statistics page indicates that figure is now up to 100 hours per minute (about 1.7 hours per second).

Impressive numbers to be sure. However, there's another set of numbers I personally consider significantly more impressive: every second, more than seven billion seconds are added to the tally of collective human existence on Earth.

Think about that for a moment.

Tick. Another 7 billion seconds of collective human existence.

Tick. Another 117 million minutes of collective human existence.

Tick. Another 2 million hours of collective human existence.

Tick. Another 81 thousand days of collective human existence.

Tick. Another 11 thousand weeks of collective human existence.

Tick. Another 222 years of collective human existence.

222 years of collective human experience, every single second, of every single day. And as the world population grows, it's only going to get faster.

222 years of collective human experience per second.

13 centuries per minute.

801 centuries per hour.

19 millenia per day.

135 millenia per week.

7 billion years per year.

The growth in our collective human experience over the course of a single year would stretch halfway back to the dawn of time if it was experienced by an individual.

We currently squander most of that potential. We allow a lot of it to be wasted scrabbling for the basic means of survival like food, clean water and shelter. We lock knowledge up behind closed doors, forcing people to reinvent solutions to already solved problems because they can't afford the entry fee.

We ascribe value to people based solely on their success in the resource acquisition game that is the market economy, without acknowledging the large degree to which sheer random chance is frequently the determinant in who wins and who loses.

We inflict bile and hate on people who have the temerity to say "I'm here, I'm human, and I have a right to be heard", while being different from us. We often focus on those superficial differences, rather than our underlying common humanity.

We fight turf wars based on where we were born, the colour of our skin, and which supernatural beings or economic doctrines we allow to guide our actions.

Is it possible to change this? Is it possible to build a world where we consider people to have inherent value just because they're fellow humans, rather than because of specific things they have done, or specific roles they take up?

I honestly don't know, but it seems worthwhile to try. I certainly find it hard to conceive of a better possible way to spend my own meagre slice of those seven billion seconds per second :)