Posts/2012/07

Volunteer developed free-threaded cross platform virtual machines?

Since writing my Python 3 Q & A, including some thoughts on why the CPython GIL isn't likely to go away any time soon, I've been pondering the question of free-threaded cross platform virtual machines for dynamic languages. Specifically, I've been trying to think of any examples of such that are driven almost entirely by volunteer based development.

A brief VM survey


The JVM and Dalvik have plenty of full time developers, and the CLR provided by Microsoft not only has full time developers, but also isn't cross platform.
Mono's core development was funded directly by first Ximian, then Novell and now Xamarin, and since the CLR is free-threaded, free-threading support would have been a requirement from the start.

However, if we switch over to the dynamic language specific VM side, the reference implementations for both Python and Ruby use a Global Interpreter Lock to ease maintenance and maximise speed of execution in the original single-threaded scripting use case. This means neither can scale to multiple cores without using either multiple processes and some form of inter-process communications, or else invoking code that doesn't need to hold the interpreter lock (e.g. C extensions for CPython).

Both Python and Ruby have JVM and CLR implementations that are free-threaded (Jython, JRuby, IronPython, IronRuby), since they can take advantage of  the cross platform threading primitives in the underlying corporate sponsored VM.

Rubinius, with Engine Yard's backing, is creating a free-threaded Ruby interpreter in the form of Rubinius 2.0. In my opinion, they've done something smart by avoiding the Win32 API entirely and just writing POSIX code, leaving the task of dealing with Microsoft's idiosyncratic approach to OS interfaces as a problem for the MinGW developers. Unfortunately (from the point of view of this particular problem), CPython long ago adopted the approach of treating Windows as a first class native build target, rather than requiring the use of a third party POSIX compatibility layer.

PyPy is heading in a different direction, focusing on making Software Transactional Memory a viable approach to concurrency in Python, without the well-known data corruption and deadlock pitfalls of thread-based concurrency.

Lua doesn't support native threading in the core VM at all - it just has a couple of GIL hooks that are no-ops by default, but can be overridden to implement a GIL.

Perl 5 supports threads using the subinterpreter model - by default, all state is thread local and you have to take explicit steps to make it visible to other threads. Perl also warns that using threads may lead to segfaults when using non-thread-safe modules.

Parrot (and thus Perl 6) has a rather ambitious concurrency model, but I have no idea how well it works in practice. With Perl 6 still in development, are there any documented production deployments?

Javascript doesn't support full shared memory thread, only Web Worker Threads. Since objects have to be serialised for inter-thread communication, the model is closer to lightweight processes than it is to shared memory threading.

Whither CPython?


CPython doesn't have any full time developers assigned these days - the PSF budget doesn't stretch that far (yet!), and the companies that use Python (including PSF sponsor members) are generally (with a couple of notable exceptions) more interested in paying people to build applications with the versions that exist now rather than paying them directly to build better versions for use in the future. That's not to say companies don't contribute code (we see plenty of corporate contributions in the form of upstream patches from Linux distro vendors like Red Hat and Canonical, as well as major users like CCP Games, and companies have sponsored particular development activities via the PSF, such as Dave Murray's work on email enhancements that landed in 3.3), just that they don't tend to pay developers to think about future directions for Python in general.


Even when the PythonLabs team (IIRC, Guido van Rossum, Tim Peters, Barry Warsaw, Jeremy Hylton, Fred Drake, maybe some others) were being funded by Digital Creations/Zope Corporation:
  • it still wasn't full time for any of them
  • multi-core machines were still rare back then
  • DC/Zope is focused on web applications, which are far more likely to be IO bound than CPU bound
In more recent years, and this is the first of the exceptions I mentioned earlier, we had Google paying Guido to spend 20 hours a week guiding the development of Python 3, but that was all about fixing the Unicode model rather than improving multi-core support.

The other exception was the Google funded Unladen Swallow effort, which aimed to bring an LLVM based JIT to CPython. While that effort did result in many improvements to LLVM, and the creation of an excellent benchmark suite for long running Python processes (much of which is still used by PyPy to this day), it ultimately failed in its original aim.

Formalising and enhancing subinterpreters

Given the high compatibility risks with existing threaded Python code and especially the risk of segfaults in C extensions that come with making CPython truly free-threaded, the Perl 5 subinterpreter model actually looks like the most promising way forward to me. With that approach, all code execution within a given interpreter is still serialised as normal, while a new communication mechanism would allow data to be safely passed between interpreters.

Since it isn't exposed at the Python level, many developers don't realise that CPython already supports the use of subinterpreters to provide some degree of isolation between different pieces of code. The Apache mod_wsgi module uses this feature to provide some isolation between different WSGI applications running on the same Apache instance.

Unfortunately, there are currently quite a few quirks and limitations with this feature, which is why it has never been elevated to a formal part of the language specification and exposed at the Python level. In addition, the GIL is part of the state that is still shared, so exposing the feature as it exists today wouldn't help at all with concurrency goals.

That leads to my personal recommendation to anyone that would like to see better thread-based concurrency support in CPython:
  • Create a CPython fork (either by importing directly from http://hg.python.org/cpython, or by forking the BitBucket mirror).
  • Make the subinterpreter support compatible with the PyGilState APIs (Graham Dumpleton and I will actually be discussing this aspect at PyConAU next month, so I'd suggest talking to Graham before doing anything on this part)
  • Create a two-tiered locking scheme, where each interpreter (including the main interpreter) has a Subinterpreter Lock that is used to protect the main eval loop, while the Global Interpreter Lock remains in place to protect state that is shared between interpreters
  • Use the subinterpreter lock in preference to the GIL to protect most Python code evaluation
  • Design a mechanism for passing objects between interpreters without serialising or copying them. The CLR application domain design may provide some inspiration here.
This is by no means an easy project, but it's the one I see as having the greatest potential for allowing CPython to exploit multiple cores effectively without requiring serialisation of data. I'll also note that whatever mechanism is designed for that last bullet point may potentially translate to efficient communication between local processes via memory mapped files.

But no, I'm not going to write it. Given what I work on (task automation and IO bound web and CLI applications), I don't need it personally or professionally, and it's too big a project to realistically attempt as a hobby/language PR exercise.

If you're interested in funding efforts to make something like this happen for Python 3.4 (likely coming in early-mid 2014), but don't know how to go about finding developers to work on it, then it's worth getting in touch with the PSF board. If you want better thread-based concurrency support in Python and are a Red Hat customer, it also wouldn't hurt to mention it via the appropriate channels :)

Update: Added Javascript to the VM survey.

The title of this blog

This article in praise of taking the time for idleness does a good job of articulating some of the reasons behind the title of this blog. I'm very jealous of my idle time - I don't like it when I have things planned in advance night after night, week after week. I want my downtime to just do whatever seems interesting at the time, and I don't function well if I find it necessary to go without it for an extended period.

Being bored and being lazy are widely seen as things to be avoided. However, it all depends on how you look at them.

Boredom is largely a sign of incredible luxury - a time when the world is placing no immediate demands on us, so we have to come up with some means of satisfying our innate desire to be doing something. Being bored means we're not busy obtaining food, or water, or shelter, or defending ourselves (or our food/water/shelter) from attackers, or otherwise pursuing the basic necessities of survival. It's an opportunity to play - maybe to explore (and change!) the world around us, maybe to explore fictional worlds created by others, maybe to create fictional worlds of our own, or to teach others about the real world.

The negative view on being lazy often rests on unstated assumptions (even fears) about the purpose of life: "Make more of yourself!", "Do something with your life!", "Leave your mark on the world!". When you get right down to it though, nobody (and I mean nobody) knows the meaning of life. We don't really know why it's better to get out of bed each morning and face the world - we just choose to believe that life is better than non-life, and engaging with the world is better than ignoring it. We create all sorts of stories we tell ourselves to justify our reasons for rejecting nihilism (to the point of killing each other over our choice of stories), but it ultimately comes down to a decision that the only life we know we have is this one, so we may as well do what can to try and enjoy it while we're here. Once we make that decision, and our basic survival needs are taken care of, everything beyond that point is optional and what we pursue will depend on what we're taught to perceive as valuable.

If you look at the developed world, massive sections of it are aimed at giving people something to do when they're bored because their basic survival needs are taken care of more efficiently than they are by subsistence farming or hunter-gathering. This idle time may be spent creating new things, or consuming those things previously created by others. Some people see efficiency gains as a way to do more work in the same amount of time, but it's equally possible to exploit those gains to do the same amount of work in less time, leaving more time to be idle, and hence bored, and hence looking for other things to do. Is the former choice always better than the latter lazy choice? I don't believe so.

Retreating from the deep philosophical questions and getting back to the more mundane question of the blog title, I do own another domain that redirects to this one, and thus have occasionally tinkered with the idea of rebranding the site as Curious Efficiency. This would put a more traditionally "positive" spin on the concepts of idle investigation and elimination of unnecessary work mentioned in the blurb. However, I find the questions raised by the negative forms more intriguing though, and thus the current title remains. That said, if I ever get around to using my own domain for my primary email address, it will definitely be curiousefficiency rather than boredomandlaziness :)