Volunteer developed free-threaded cross platform virtual machines?

Since writing my Python 3 Q & A, including some thoughts on why the CPython GIL isn't likely to go away any time soon, I've been pondering the question of free-threaded cross platform virtual machines for dynamic languages. Specifically, I've been trying to think of any examples of such that are driven almost entirely by volunteer based development.

A brief VM survey


The JVM and Dalvik have plenty of full time developers, and the CLR provided by Microsoft not only has full time developers, but also isn't cross platform.
Mono's core development was funded directly by first Ximian, then Novell and now Xamarin, and since the CLR is free-threaded, free-threading support would have been a requirement from the start.

However, if we switch over to the dynamic language specific VM side, the reference implementations for both Python and Ruby use a Global Interpreter Lock to ease maintenance and maximise speed of execution in the original single-threaded scripting use case. This means neither can scale to multiple cores without using either multiple processes and some form of inter-process communications, or else invoking code that doesn't need to hold the interpreter lock (e.g. C extensions for CPython).

Both Python and Ruby have JVM and CLR implementations that are free-threaded (Jython, JRuby, IronPython, IronRuby), since they can take advantage of  the cross platform threading primitives in the underlying corporate sponsored VM.

Rubinius, with Engine Yard's backing, is creating a free-threaded Ruby interpreter in the form of Rubinius 2.0. In my opinion, they've done something smart by avoiding the Win32 API entirely and just writing POSIX code, leaving the task of dealing with Microsoft's idiosyncratic approach to OS interfaces as a problem for the MinGW developers. Unfortunately (from the point of view of this particular problem), CPython long ago adopted the approach of treating Windows as a first class native build target, rather than requiring the use of a third party POSIX compatibility layer.

PyPy is heading in a different direction, focusing on making Software Transactional Memory a viable approach to concurrency in Python, without the well-known data corruption and deadlock pitfalls of thread-based concurrency.

Lua doesn't support native threading in the core VM at all - it just has a couple of GIL hooks that are no-ops by default, but can be overridden to implement a GIL.

Perl 5 supports threads using the subinterpreter model - by default, all state is thread local and you have to take explicit steps to make it visible to other threads. Perl also warns that using threads may lead to segfaults when using non-thread-safe modules.

Parrot (and thus Perl 6) has a rather ambitious concurrency model, but I have no idea how well it works in practice. With Perl 6 still in development, are there any documented production deployments?

Javascript doesn't support full shared memory thread, only Web Worker Threads. Since objects have to be serialised for inter-thread communication, the model is closer to lightweight processes than it is to shared memory threading.

Whither CPython?


CPython doesn't have any full time developers assigned these days - the PSF budget doesn't stretch that far (yet!), and the companies that use Python (including PSF sponsor members) are generally (with a couple of notable exceptions) more interested in paying people to build applications with the versions that exist now rather than paying them directly to build better versions for use in the future. That's not to say companies don't contribute code (we see plenty of corporate contributions in the form of upstream patches from Linux distro vendors like Red Hat and Canonical, as well as major users like CCP Games, and companies have sponsored particular development activities via the PSF, such as Dave Murray's work on email enhancements that landed in 3.3), just that they don't tend to pay developers to think about future directions for Python in general.


Even when the PythonLabs team (IIRC, Guido van Rossum, Tim Peters, Barry Warsaw, Jeremy Hylton, Fred Drake, maybe some others) were being funded by Digital Creations/Zope Corporation:
  • it still wasn't full time for any of them
  • multi-core machines were still rare back then
  • DC/Zope is focused on web applications, which are far more likely to be IO bound than CPU bound
In more recent years, and this is the first of the exceptions I mentioned earlier, we had Google paying Guido to spend 20 hours a week guiding the development of Python 3, but that was all about fixing the Unicode model rather than improving multi-core support.

The other exception was the Google funded Unladen Swallow effort, which aimed to bring an LLVM based JIT to CPython. While that effort did result in many improvements to LLVM, and the creation of an excellent benchmark suite for long running Python processes (much of which is still used by PyPy to this day), it ultimately failed in its original aim.

Formalising and enhancing subinterpreters

Given the high compatibility risks with existing threaded Python code and especially the risk of segfaults in C extensions that come with making CPython truly free-threaded, the Perl 5 subinterpreter model actually looks like the most promising way forward to me. With that approach, all code execution within a given interpreter is still serialised as normal, while a new communication mechanism would allow data to be safely passed between interpreters.

Since it isn't exposed at the Python level, many developers don't realise that CPython already supports the use of subinterpreters to provide some degree of isolation between different pieces of code. The Apache mod_wsgi module uses this feature to provide some isolation between different WSGI applications running on the same Apache instance.

Unfortunately, there are currently quite a few quirks and limitations with this feature, which is why it has never been elevated to a formal part of the language specification and exposed at the Python level. In addition, the GIL is part of the state that is still shared, so exposing the feature as it exists today wouldn't help at all with concurrency goals.

That leads to my personal recommendation to anyone that would like to see better thread-based concurrency support in CPython:
  • Create a CPython fork (either by importing directly from http://hg.python.org/cpython, or by forking the BitBucket mirror).
  • Make the subinterpreter support compatible with the PyGilState APIs (Graham Dumpleton and I will actually be discussing this aspect at PyConAU next month, so I'd suggest talking to Graham before doing anything on this part)
  • Create a two-tiered locking scheme, where each interpreter (including the main interpreter) has a Subinterpreter Lock that is used to protect the main eval loop, while the Global Interpreter Lock remains in place to protect state that is shared between interpreters
  • Use the subinterpreter lock in preference to the GIL to protect most Python code evaluation
  • Design a mechanism for passing objects between interpreters without serialising or copying them. The CLR application domain design may provide some inspiration here.
This is by no means an easy project, but it's the one I see as having the greatest potential for allowing CPython to exploit multiple cores effectively without requiring serialisation of data. I'll also note that whatever mechanism is designed for that last bullet point may potentially translate to efficient communication between local processes via memory mapped files.

But no, I'm not going to write it. Given what I work on (task automation and IO bound web and CLI applications), I don't need it personally or professionally, and it's too big a project to realistically attempt as a hobby/language PR exercise.

If you're interested in funding efforts to make something like this happen for Python 3.4 (likely coming in early-mid 2014), but don't know how to go about finding developers to work on it, then it's worth getting in touch with the PSF board. If you want better thread-based concurrency support in Python and are a Red Hat customer, it also wouldn't hurt to mention it via the appropriate channels :)

Update: Added Javascript to the VM survey.

The title of this blog

This article in praise of taking the time for idleness does a good job of articulating some of the reasons behind the title of this blog. I'm very jealous of my idle time - I don't like it when I have things planned in advance night after night, week after week. I want my downtime to just do whatever seems interesting at the time, and I don't function well if I find it necessary to go without it for an extended period.

Being bored and being lazy are widely seen as things to be avoided. However, it all depends on how you look at them.

Boredom is largely a sign of incredible luxury - a time when the world is placing no immediate demands on us, so we have to come up with some means of satisfying our innate desire to be doing something. Being bored means we're not busy obtaining food, or water, or shelter, or defending ourselves (or our food/water/shelter) from attackers, or otherwise pursuing the basic necessities of survival. It's an opportunity to play - maybe to explore (and change!) the world around us, maybe to explore fictional worlds created by others, maybe to create fictional worlds of our own, or to teach others about the real world.

The negative view on being lazy often rests on unstated assumptions (even fears) about the purpose of life: "Make more of yourself!", "Do something with your life!", "Leave your mark on the world!". When you get right down to it though, nobody (and I mean nobody) knows the meaning of life. We don't really know why it's better to get out of bed each morning and face the world - we just choose to believe that life is better than non-life, and engaging with the world is better than ignoring it. We create all sorts of stories we tell ourselves to justify our reasons for rejecting nihilism (to the point of killing each other over our choice of stories), but it ultimately comes down to a decision that the only life we know we have is this one, so we may as well do what can to try and enjoy it while we're here. Once we make that decision, and our basic survival needs are taken care of, everything beyond that point is optional and what we pursue will depend on what we're taught to perceive as valuable.

If you look at the developed world, massive sections of it are aimed at giving people something to do when they're bored because their basic survival needs are taken care of more efficiently than they are by subsistence farming or hunter-gathering. This idle time may be spent creating new things, or consuming those things previously created by others. Some people see efficiency gains as a way to do more work in the same amount of time, but it's equally possible to exploit those gains to do the same amount of work in less time, leaving more time to be idle, and hence bored, and hence looking for other things to do. Is the former choice always better than the latter lazy choice? I don't believe so.

Retreating from the deep philosophical questions and getting back to the more mundane question of the blog title, I do own another domain that redirects to this one, and thus have occasionally tinkered with the idea of rebranding the site as Curious Efficiency. This would put a more traditionally "positive" spin on the concepts of idle investigation and elimination of unnecessary work mentioned in the blurb. However, I find the questions raised by the negative forms more intriguing though, and thus the current title remains. That said, if I ever get around to using my own domain for my primary email address, it will definitely be curiousefficiency rather than boredomandlaziness :)

Django's CBVs are not a mistake (but deprecating FBVs might be)

This interesting piece from Luke Plant went by on Planet Python this morning, and really helped me in understanding many of the complaints I see about Django's Class Based Views. That problem seems to be that when CBVs were introduced, they were brought in as a replacement for the earlier procedural Function Based Views, rather than as a lower level supplemental API that covered an additional set of use cases that weren't being adequately served by the previous approach (I only started using Django with 1.3, so it's taken me a while to come up to speed on this aspect of the framework's history).

The key point in Luke's article that I agree with is that deprecating FBVs in favour of CBVs and saying the latter is always the superior solution is a mistake. The part I disagree with is that saying this also means that introducing the CBV concept itself was a mistake. CBVs may have been oversold as the "one true way" to do Django views, but "There's one - and preferably only one - obvious way to do it" is not meant to apply at the level of programming paradigms. Yes, it's a design principle that's part of the Zen of Python, and it's a good philosophy to help reduce needless API complication, but when it comes to the complexities of real world programming, you need flexibility in your modelling tools, or you end up fighting the limitations of your tools instead of being able to clearly express your intent.

Procedural programming, functional programming, object-oriented programming, pipeline-based programming etc - they're all different ways to approach a problem space, and Python is deliberately designed to support all of them.

It helps to know a bit of programming history and the origins of OOP in the context of this discussion, as Django's FBVs are very similar to implementations of OOP in C and other languages with no native OOP support: you have an object (the HTTP request) and a whole lot of functions that accept that object as their first argument.

Thus, when you don't use CBVs at all, what you're really doing is bypassing Python's native OO support in favour of a truckload of what are effectively methods on request objects (just written in a procedural style). If you want to pass state around you either store it on the request, you store it in global state (which includes your cache and main datastore) or you pass it explicitly as function arguments (which means you have to daisy chain it to anyone else that needs it). If you use classes instead, then you get an additional mechanism that you can use to affect behaviour for a subset of your views. For example, I recently restricted write access to the PulpDist REST API to site admins, when it had previously been open to all logged in users. I could do that in one place and be confident it affected the entire API because every REST view in PulpDist inherits from a common base class. Since that base class now enforces the new access restrictions, the entire API obeys the rules even though I only changed one class.

Where Luke is absolutely right, though, is that switching from a procedural approach to an object-oriented one comes with a cost, mostly in the form of non-local effects and non-obvious flow control. If you look at Python's standard library, a rather common model to alleviate this problem is the idea of providing an implementation class, which you can choose to use directly, as well as a set of module level convenience functions. Much of the time, using the convenience functions is a better choice, since they're designed to be simple and clean solutions to common tasks. However, if you need to start tweaking, then being able to either instantiate or subclass the existing backend implementation directly lets you get a lot further before you have to resort to the brute force copy-paste-edit approach to code reuse.

But please, don't confuse "Django's Generic View implementation is overly complicated and FBVs should be retained as an officially blessed and supported convenience API" with "CBVs are a bad idea". Making the latter claim is really saying "OOP is a bad idea", which is not a supportable assertion (unless you want to argue with decades of CS and software engineering experience). While the weaker claim that "An OOP implementation is often best presented to the API user behind a procedural facade" is less exciting, it has the virtue of being more universally true. Procedural APIs often are simpler and generally introduce less coupling between components. The trick with exposing an OOP layer as well is that it increases the options for your users, as they can now:

  • Just use the procedural layer (Huzzah! Low coupling is good)
  • Use the OOP layer through composition (OK, better than reinventing the wheel and coupling is still fairly low when using composition)
  • Use the OOP layer through inheritance (Eek, coupling is increasing substantially now, but it's typically still better than copy-paste-edit style reuse)
  • Use the upstream implementation as a reference or starting point when writing your own (coupling drops back to zero, but the line count of code that is directly part of the current project just went up substantially)
Where Django has arguably made a mistake is in thinking that exposing an OOP layer directly is a reasonable substitute for a pre-existing procedural layer. In general, that's not going to be the case for all the reasons Luke cites in his article. Having the procedural layer become a thin veneer around the published object oriented layer would probably be a good thing, while deprecating it and actively discouraging it's use, even for the cases it handles cleanly, seems potentially unwise.

A good example of this layered approach to API design is the str.format method. The main API for that is of course the str.format() method itself and that covers the vast majority of use cases. If you just want to customise the display of a particular custom type, then you can provide a __format__ method directly on that class. However, if you want to write a completely custom formatter (for example, one that automatically quotes interpolated values with shlex.quote), then the string.Formatter class is exposed so that you can take advantage of most of the builtin formatting machinery instead of having to rewrite it yourself. Contrast that with the older %-based approach to formatting - if you want to implement a custom formatter based on that, you're completely on your own, the standard library provides no help whatsoever. PEP 3101 provides some of the rationale behind the layered string formatting API. It's by no means perfect, but perfection wasn't the goal - the goal was providing something more flexible and less quirky than %-style formatting, and in that it succeeded admirably. The key lesson that's applicable to Django is that string.Formatter isn't a replacement for str.format, it's a supplement for the relatively rare cases where the simple method API isn't flexible enough.

A few other examples of this layered API design that spring immediately to mind are the logging module (which provides convenience functions to pass messages directly to the root logger), subprocess (with a few convenience functions that aim to largely hide the Swiss army knife that is subprocess.Popen), textwrap (with textwrap.dedent() providing a shorthand for a particular way of using textwrap.TextWrapper), pickle, json, importlib... You get the idea :)

Update: Toned down the title and the paragraph after the bulleted list slightly. Since I've never used them myself, I don't know enough about the abuses of FBVs to second guess the Django core devs motivations for actively encouraging the switch to CBVs.

An embarrassment of riches

Years ago (but still within the last decade) I was involved in a source control trade study for a large multi-national corporation. Management had let a non-software developer select the original "source control tool" and they had picked something that required custom scripting just to do a baseline (I wish I was kidding).

So a bunch of candidate replacements were put forward for consideration, and CVS won because it was free, thus there would be fewer arguments with management about rolling it out on a project that was already over budget and behind schedule. (The fact that Subversion wasn't considered as a candidate should give you some additional hints about the precise timing of this - Subversion 1.0 was released in February 2004. Yes, for those that are new to this game, you read that right: it is only within the last decade that the majority of the open source VCS world began to enjoy the benefits of atomic commits).

Other interesting aspects of that system included the fact that one of the developers on that project basically had to write a custom xUnit testing system from scratch in order to start putting together a decent automated test suite for the system, there was no code review tool, and you couldn't include direct links to bug tracker items in emails or anything else - you had to reference them by name or number, and people would then look those names or numbers up in the dedicated bug tracking application client.

High level design documentation, if it existed at all, was in the form of Microsoft Word documents. Low level API documentation? Yes, that would have been nice (there were some attempts to generate something vaguely readable with Doxygen but, yeah, well, C++).

Less than ten years later, though, and there are signs our industry is starting to grow up (although I expect many enterprise shops are still paying extortionate rates to the likes of IBM for the "Rational" suite of tools only to gain a significantly inferior development experience):

  1. You can get genuinely high quality code hosting for free. Sure Sourceforge was already around back then, but Git and Mercurial stomp all over CVS from a collaboration point of view. These also come with decent issue trackers and various other collaboration tools. If you don't want to trust a service provider with your code, than tools like GitLab let you set up similar environments internally.
  2. Web based issue trackers are everywhere, with the ubiquitous "issue URL" allowing effective cross-linking between tracker issues, documentation, code comments, source control browsers, code review systems, etc.
  3. Dedicated code review tools like Gerrit and Reitveld are published as open source (and, in the case of the latter, even available as a free service on Google App Engine).
  4. Services like ReadTheDocs exist, allowing you to easily build and publish high quality documentation. All with nice URLs so you can link it from emails, tracker issues, source code, etc.
  5. Organisations like Shining Panda CI and Travis CI provide hosted continuous integration services that put the internal capabilities of many large companies to shame.
  6. Language communities provide cross-platform distribution services to reach a global audience.
  7. Depending on the language you use, you may even have tools like SonarSource available
  8. Once you go into production in the web application world, service components like Sentry, Piwik, and Graphite are again available for no charge.
And to access all this good stuff for free? All you have to do is be willing to share your work (and sometimes not even that). If you don't want to share your work, then the service providers generally have very reasonable fees - you could probably put together a state of the art suite of tools for less than a few hundred bucks a month.

Take my own hobby projects as an example:
  • they're hosted on BitBucket as Mercurial projects (I happen to prefer Mercurial, although I can definitely see why people like Git, too). That gives me integrated issue tracking and online source code browsing, too. (OK, so I could have had essentially that back in the early SourceForge days, but the UI aspects have improved in many respects in the intervening years)
  • I can publish my projects on the Python Package Index with a simple "setup.py sdist upload". They're then available for anyone in the world to install with a straightforward command like "pip install walkdir"
  • thanks to Shining Panda CI, I know the downloads from PyPI work, and I also know that the projects work on all the versions and implementations of Python I want to support
  • thanks to ReadTheDocs and Sphinx, you can read nicely formatted documentation like this rather than trying to decipher plain text files or wiki pages.
I'm living in the future and it is seriously cool (and that's just looking at things purely from a software development infrastructure point of view - the rise of "Infrastructure as a Service" and "Platform as a Service" providers, including Red Hat's own OpenShift, has massive implications on the deployment side of things, and there's of course the implications of the many open source wheels that don't need to be reinvented)

The best part from my point of view is that these days I get to work for a company that already genuinely understands the long term significance of the power of collaborative development. It also doesn't hurt that there's still a lot of money to be made in helping the rest of the enterprise world come to grips with that reality :)

contextlib2 0.4: Now with ExitStack!

Inspired by Michael Foord's efforts with unittest2, contextlib2 is a PyPI package where I am working on some new additions to the standard library's contextlib module for Python 3.3.

The most interesting of those is a replacement for the ill-fated contextlib.nested API.

If you use Python 3.2 today, you'll find that the contextlib.nested API doesn't even exist any more. The reason it was deprecated and removed is because it didn't play well with context managers that acquired their resources in __init__ rather than __enter__ (such as Python's own file objects and many other resources where using a with statement for management is an optional convenience rather than being mandatory).

The simplest example where the old API caused problems was opening a list of files and then using contextlib.nested to close them all when the operation was complete - if opening any later file threw an exception (e.g. due to a permissions error or a bad file name), then all of the earlier files would remain open until the garbage collector got around to cleaning them up. Not a huge problem on CPython with its refcounting based GC, but a far cry from the deterministic resource cleanup that context managers are supposed to offer.

Since the deprecation and removal of contextlib.nested, there have been assorted replacement proposals of varying levels of sophistication posted to the Python ideas mailing list. The new ExitStack API in this release is my own latest effort, and the first that I've liked well enough to seriously consider as a candidate for inclusion in the standard library module.

The idea behind providing the ExitStack API is for the standard library to focus specifically on handling the one particularly tricky part of dealing with context managers programmatically: unwinding the context stack correctly, ensuring that exceptions are processed exactly as if any context managers involved had been used in an appropriate series of nested with statements.

A couple of convenience methods are included (one that enters a context manager in addition to pushing the exit method on to the stack, as well as a simple callback registration method inspired by unittest.TestCase.addCleanup), but the features of the API are otherwise fairly minimal.

This low level dynamic API can then be used by developers to create their own higher level convenience APIs, as suggested in some of the examples in the documentation.

A few specific design notes:

  • The name ExitStack came about because the object is literally a stack of exit method references (or callback wrappers that behave like exit methods). Earlier variants were ContextStack (too narrow, since you can use the stack for standalone callbacks) and CallbackStack (too broad, since the stored callbacks specifically have the signature of exit methods)
  • The push() method accepts exit methods directly, since those are what actually gets stored on the stack. Ducktyping to also accept objects with an __exit__() method is convenient without being confusing (I hope).
  • The enter_context() method uses the longer name because the shorter version is too easy to confuse with the stack's own __enter__() method.
If you have any questions about the ExitStack design, this is the place to ask. If you find any bugs or other defects, head over to the issue tracker.

WalkDir 0.3 released (for more Python versions, thanks to Shining Panda!)

WalkDir is my Python support library that aims to make it as easy to work with filtered directory listings as it is to walk over entire directory trees with os.walk().

The module's design tries to take full advantage of Python's iterator model - most of its functionality is provided by pipelined iterators that accept os.walk() style iterables and expose the same interface themselves.

The only major functional change in version 0.3 is that these pipelined iterators now make sure they pass along the objects produced by the underlying iterators, and only use indexing operations to access the individual fields. Previously they would use tuple unpacking to access the directory details, which restricted the supported types to those with exactly 3 fields and also had the side effect of replacing the underlying objects with ordinary 3-tuples.

I changed this mainly due to a new OS interface that is likely to be coming in Python 3.3: an os.walk() variant that produces a 4-tuple rather than a 3-tuple. The 4th value will be a file descriptor for the directory making it easier (in conjunction with new file descriptor based APIs in the 3.3 os module) to write filesystem modification code that is robust against symlink attacks. By passing the underlying objects through unmodified, WalkDir is now compatible with this API - all the path based filtering will still work, but the file descriptor values will also be passed along correctly.

For those that haven't seen any of my previous comments on WalkDir, the other parts of the API are just there for convenience - one factory function that constructs pipelines for you, and 3 terminal iterators that flatten out the os.walk() style triples into a simple series of paths (all paths, just the visited directories or just the file paths).

The other notable change in 0.3 is the list of officially supported versions. Previously, the module was only known to work on 2.7 and 3.2+ (since they're the versions I have on my home development machine). However, thanks to a free open source account provided by the folks at Shining Panda, WalkDir 0.3 is known to work on Python 2.6, 2.7 and 3.1+ (I even test it on PyPy and Stackless, just because I can). After pushing a broken package to PyPI for 0.2, I even have a sanity check I can run that ensures the module can be downloaded with pip and then imported on all the supported versions.

Using the SOPA protests to highlight related problems in Australia

I figure this is the easiest place to publish the message I just sent to Larissa Waters, the Greens Senator that is one of Queensland's representatives in the Federal Senate. I also wrote to Yvette D'ath (our local MHR) a few days ago, but I didn't keep a copy of that one. Will this achieve anything? Probably not, but hey, at least I tried (and if none of their constituents ever write to them about it, our reps are quite entitled to assume we're all OK with them selling out the county to legacy US media interests):

Senator Waters,

With today being the day Wikipedia and a wide range of other sites have either gone dark or taken other action to protest draconian internet censorship legislation making its way through the US Congress, it seems an opportune time to highlight our own government's ongoing concerning behaviour on that front.

Of particular concern is their continuing refusal to release details of a secretive meeting between government representatives and representatives of the same organisations that are behind the draconian US bills currently being protested. The government even deliberately excluded representatives of a number of community interest organisations that sought to attend these discussions.

These legacy media companies (aka horse drawn carriage manufacturers) are flailing around wildly as the rise of free and open digital communications networks (aka automobiles) threatens the cherished gatekeeper role they have enjoyed for the past few decades as media distributors. They have failed to adapt, and are increasingly being bypassed as artists, writers, musicians, comedians and other media creators find ways to use the power of the internet to connect more directly with their fans. These direct connections are great for both artists and fans, but place the intermediaries like YouTube, Apple iTunes, Amazon, BandCamp, Flickr, etc, in the role of service providers to the artists and fans rather than gatekeepers to widespread distribution. Unfortunately, instead of going gracefully into that good night, these organisations are investing inordinate sums of money worldwide in lobbying for legislation that would make the permissive, open practices of most of these new service providers a recipe for prohibitively high legal liabilities, effectively making those practices unsustainable and thus breaking the internet as we know it today.

Australia already markedly shifted many intellectual monopoly policies to favour the interests of US copyright holders at the expense of Australian citizens when we signed the US-FTA some time ago. We have also participated in the secretive process of drafting the Anti-Counterfeiting Trade Agreement, which spends far more time considering digital copyright infringement than it does actual counterfeiting. The current negotiations over membership in the Trans-Pacific Partnership agreement raise legitimate fears that Australia's intellectual monopoly policy will be shifted even further towards the draconian position of the United State Trade Representative, even as those policies are being protested strongly within the US itself.

In line with your published policy on community participation in government, do the Greens plan to publicly question the government over their apparent willingness to place the interest of large US companies ahead of those of individual Australian citizens?

Regards,
Nick.

New Year Python Meme - December 2011

I'm normally a curmudgeon about this kind of thing, but I enjoyed reading some of the other posts in this series Tarek kicked off, so I decided to make my own contribution.

1. What's the coolest Python application, framework or library you have discovered in 2011?
The move to Red Hat marked my entry into the world of web development (previously I'd merely been in interested observer of that world, rather than a participant). By far my favourite discovery since making that change is django-rest-framework - with that, I can use my web browser to browse early iterations of my server's REST API directly, without needing to write custom clients to process the JSON data from APIs that are still in a state of flux.

As a service, ReadTheDocs has been an absolute revelation - between that, code hosting & issue management sites like BitBucket and GitHub and of course PyPI itself, it's now possible for an open source project to have a quite respectable web presence without the developers needing to understand anything more than Sphinx, source control and the project they're working on.

2. What new programming technique did you learn in 2011?
REST would be the big one. I'd had some general exposure to the concept in the past, but there's no substitute for sitting down and building it into a product when it comes to understanding a programming or API design technique.

3. What's the name of the open source project you contributed the most in 2011? What did you do?
CPython, by far - kibbitzing on python-dev and python-ideas (and import-sig too these days), writing and reviewing several different PEPs, documentation updates, code reviews and patch applications, as well as working on my own things (including the still-in-progress integration work for the 'yield from' expression that's coming in 3.3).

I also recently started up 4 separate open source projects - 3 PyPI modules to hopefully address deficiencies I see in the current standard library offerings, plus the upstream open source project for my current development efforts at Red Hat:
  • contextlib2 (ContextStack has some potential as a new building block)
  • WalkDir (the idea here is to be the "itertools for os.walk()")
  • Shell Command (let Python handle control flow, the shell actual commands)
  • PulpDist (Bringing a semblance of order to small-scale rsync mirror networks)

4. What was the Python blog or website you read the most in 2011?
Planet Python.

5. What are the three top things you want to learn in 2012?
From a work point of view, getting my RHCSA (Red Hat Certified System Administrator) is at the top of the list. Coming up to speed on AMQP (Advanced Message Queuing Protocol) is a close second. Finally, I want to fill in more of the gaps in my very sketchy knowledge of web UI development (i.e. HTML/CSS/Javascript).

6. What are the top software, app or lib you wish someone would write in 2012?
I want to see the __preview__ namespace (in particular, the regex module) make it into Python 3.3. But that requires a volunteer to step up and write the PEP, write the code and generally champion the idea (if we have to wait for me to do it, there's no way it will happen before 3.4).

Want to do your own list? here's how:
  1. copy-paste the questions and answer to them in your blog
  2. tweet it with the #2012pythonmeme hashtag

Help improve the Python 3.3 Standard Library...

... and hopefully help yourself with current programming projects, too.

Some recent programming activities left me underwhelmed by a few of the standard library's included batteries. This has already led to a significant revamp of the subprocess module documentation to steer new users away from the Popen Swiss army knife (unless they really need it) and to explain the commonly needed parameters more clearly. It still needs work (the notes and warnings are far too repetitive), but it at least introduces things in the right order now (high level convenience API that most people want first, lower level Popen API that some people need second).

However, for 3.3 I'd like to improve things even more in at least three areas: invocation of the system shell for administration tasks, better tools for traversing filesystem directories and programmatic management of deterministic resource cleanup (i.e. not relying on the garbage collector).

Accordingly, I have 3 projects up on PyPI (with docs on ReadTheDocs and source control and issue tracking on BitBucket):

  • WalkDir: os.walk() style iterators with file and directory filtering (both inclusion and exclusion), depth limiting and symlink loop detection, as well as convenience iterators to flatten os.walk() style iterators into a series of paths (either all walked paths, just the directories or just the files). I currently plan to make (at least some of) these part of the shutil module, but exactly what gets added will be based on the feedback I receive on this module and its API design.
  • Shell Command: Convenience APIs that combine subprocess invocation with string interpolation. Interpolated strings are escaped with shlex.quote() by default, with a custom conversion specifier ("!u", for unquoted) used to invoke the standard interpolation process. It also features an experimental API where I'm tinkering with the use of select.select() on subprocess pipes (I'm not sure it achieves a lot over simple blocking IO in its current form, though). The current plan for this API is that it will be added directly to the subprocess module (well, the stable and sensible parts will be, anyway - I still have my doubts about the select.select() experiment)
  • contextlib2: This module basically exists to let me publish and gather feedback on ContextStack, a proposed addition to contextlib for 3.3 that should make it easier to manage deterministic resource cleanup programmatically (i.e. without coupling it as directly to code layout as simple with statements do).

Feedback on any and all of these is appreciated, either here or on the respective issue trackers. It isn't a foregone conclusion that any of these APIs will be added at all, so examples of real world use cases would definitely be helpful.