Since writing my Python 3 Q & A, including some thoughts on why the CPython GIL isn't likely to go away any time soon, I've been pondering the question of free-threaded cross platform virtual machines for dynamic languages. Specifically, I've been trying to think of any examples of such that are driven almost entirely by volunteer based development.
A brief VM survey
The JVM and Dalvik have plenty of full time developers, and the CLR provided by Microsoft not only has full time developers, but also isn't cross platform.
Mono's core development was funded directly by first Ximian, then Novell and now Xamarin, and since the CLR is free-threaded, free-threading support would have been a requirement from the start.
However, if we switch over to the dynamic language specific VM side, the reference implementations for both Python and Ruby use a Global Interpreter Lock to ease maintenance and maximise speed of execution in the original single-threaded scripting use case. This means neither can scale to multiple cores without using either multiple processes and some form of inter-process communications, or else invoking code that doesn't need to hold the interpreter lock (e.g. C extensions for CPython).
Both Python and Ruby have JVM and CLR implementations that are free-threaded (Jython, JRuby, IronPython, IronRuby), since they can take advantage of the cross platform threading primitives in the underlying corporate sponsored VM.
Rubinius, with Engine Yard's backing, is creating a free-threaded Ruby interpreter in the form of Rubinius 2.0. In my opinion, they've done something smart by avoiding the Win32 API entirely and just writing POSIX code, leaving the task of dealing with Microsoft's idiosyncratic approach to OS interfaces as a problem for the MinGW developers. Unfortunately (from the point of view of this particular problem), CPython long ago adopted the approach of treating Windows as a first class native build target, rather than requiring the use of a third party POSIX compatibility layer.
PyPy is heading in a different direction, focusing on making Software Transactional Memory a viable approach to concurrency in Python, without the well-known data corruption and deadlock pitfalls of thread-based concurrency.
Lua doesn't support native threading in the core VM at all - it just has a couple of GIL hooks that are no-ops by default, but can be overridden to implement a GIL.
Perl 5 supports threads using the subinterpreter model - by default, all state is thread local and you have to take explicit steps to make it visible to other threads. Perl also warns that using threads may lead to segfaults when using non-thread-safe modules.
Parrot (and thus Perl 6) has a rather ambitious concurrency model, but I have no idea how well it works in practice. With Perl 6 still in development, are there any documented production deployments?
CPython doesn't have any full time developers assigned these days - the PSF budget doesn't stretch that far (yet!), and the companies that use Python (including PSF sponsor members) are generally (with a couple of notable exceptions) more interested in paying people to build applications with the versions that exist now rather than paying them directly to build better versions for use in the future. That's not to say companies don't contribute code (we see plenty of corporate contributions in the form of upstream patches from Linux distro vendors like Red Hat and Canonical, as well as major users like CCP Games, and companies have sponsored particular development activities via the PSF, such as Dave Murray's work on email enhancements that landed in 3.3), just that they don't tend to pay developers to think about future directions for Python in general.
Even when the PythonLabs team (IIRC, Guido van Rossum, Tim Peters, Barry Warsaw, Jeremy Hylton, Fred Drake, maybe some others) were being funded by Digital Creations/Zope Corporation:
In more recent years, and this is the first of the exceptions I mentioned earlier, we had Google paying Guido to spend 20 hours a week guiding the development of Python 3, but that was all about fixing the Unicode model rather than improving multi-core support.
- it still wasn't full time for any of them
- multi-core machines were still rare back then
- DC/Zope is focused on web applications, which are far more likely to be IO bound than CPU bound
The other exception was the Google funded Unladen Swallow effort, which aimed to bring an LLVM based JIT to CPython. While that effort did result in many improvements to LLVM, and the creation of an excellent benchmark suite for long running Python processes (much of which is still used by PyPy to this day), it ultimately failed in its original aim.
Formalising and enhancing subinterpretersGiven the high compatibility risks with existing threaded Python code and especially the risk of segfaults in C extensions that come with making CPython truly free-threaded, the Perl 5 subinterpreter model actually looks like the most promising way forward to me. With that approach, all code execution within a given interpreter is still serialised as normal, while a new communication mechanism would allow data to be safely passed between interpreters.
Since it isn't exposed at the Python level, many developers don't realise that CPython already supports the use of subinterpreters to provide some degree of isolation between different pieces of code. The Apache mod_wsgi module uses this feature to provide some isolation between different WSGI applications running on the same Apache instance.
Unfortunately, there are currently quite a few quirks and limitations with this feature, which is why it has never been elevated to a formal part of the language specification and exposed at the Python level. In addition, the GIL is part of the state that is still shared, so exposing the feature as it exists today wouldn't help at all with concurrency goals.
That leads to my personal recommendation to anyone that would like to see better thread-based concurrency support in CPython:
This is by no means an easy project, but it's the one I see as having the greatest potential for allowing CPython to exploit multiple cores effectively without requiring serialisation of data. I'll also note that whatever mechanism is designed for that last bullet point may potentially translate to efficient communication between local processes via memory mapped files.
- Create a CPython fork (either by importing directly from http://hg.python.org/cpython, or by forking the BitBucket mirror).
- Make the subinterpreter support compatible with the PyGilState APIs (Graham Dumpleton and I will actually be discussing this aspect at PyConAU next month, so I'd suggest talking to Graham before doing anything on this part)
- Create a two-tiered locking scheme, where each interpreter (including the main interpreter) has a Subinterpreter Lock that is used to protect the main eval loop, while the Global Interpreter Lock remains in place to protect state that is shared between interpreters
- Use the subinterpreter lock in preference to the GIL to protect most Python code evaluation
- Design a mechanism for passing objects between interpreters without serialising or copying them. The CLR application domain design may provide some inspiration here.
But no, I'm not going to write it. Given what I work on (task automation and IO bound web and CLI applications), I don't need it personally or professionally, and it's too big a project to realistically attempt as a hobby/language PR exercise.
If you're interested in funding efforts to make something like this happen for Python 3.4 (likely coming in early-mid 2014), but don't know how to go about finding developers to work on it, then it's worth getting in touch with the PSF board. If you want better thread-based concurrency support in Python and are a Red Hat customer, it also wouldn't hurt to mention it via the appropriate channels :)