Stop Supporting Python 2.6 (For Free)

(Note: I'm speaking with my "CPython core developer" hat on in this article, rather than my "Red Hat employee" one, although it's the latter role that gave me broad visibility into the Fedora/RHEL/CentOS Python ecosystem)

Alex Gaynor recently raised some significant concerns in relation to his perception that Red Hat expects the upstream community to support our long term support releases for as long as we do, only without getting paid for it.

That's not true, so I'm going to say it explicitly: if you're currently supporting Python 2.6 for free because folks using RHEL 6 or one of its derivatives say they need it, and this is proving to be a hassle for you, then stop. If they complain, then point them at this post, as providing an easily linkable reference for that purpose is one of the main reasons I'm writing it. If they still don't like it, then you may want to further suggest that they come argue with me about it, and leave you alone.

The affected users have more options than they may realise, and upstream open source developers shouldn't feel obliged to donate their own time to help end users cope with organisations that aren't yet able to upgrade their internal infrastructure in a more timely fashion.

Red Hat Supported Python Upgrade Paths

Since September 2013, Red Hat Enterprise Linux subscriptions have included access to an additional component called Red Hat Software Collections. You can think of Software Collections roughly as "virtualenv for the system package manager", providing access to newer language runtimes (including Python 2.7 and 3.3), database runtimes, and web server runtimes, all without interfering with the versions of those integrated with (and used by) the operating system layer itself.

This model (and the fact they're included with the base Red Hat Enterprise Linux subscription) means that Red Hat subscribers are able to install and use these newer runtimes without needing to upgrade the entire operating system.

Since June 2014, Red Hat Enterprise Linux 7 has also been available, including an upgrade of the system Python to Python 2.7. The latest release of that is Red Hat Enterprise Linux 7.1.

As Red Hat subscriptions all include free upgrades to new releases, the main barrier to deployment of these newer Python releases is institutional inertia. While it's entirely admirable that many upstream developers are generous enough to help their end users work around this inertia, in the long run doing so is detrimental for everyone concerned, as long term sustaining engineering for old releases is genuinely demotivating for upstream developers (it's a good job, but a lousy way to spend your free time), and for end users, working around institutional inertia this way reduces the pressure to actually get the situation addressed properly.

Beyond Red Hat Enterprise Linux, Red Hat's portfolio also includes both the OpenShift Platform-as-a-Service offering and the Feed Henry Mobile Application Platform. For many organisations looking to adopt an iterative approach to web service development, those are going to be a better fit than deploying directly to Red Hat Enterprise Linux and building a custom web service management system around that.

Third Party Supported Python Upgrade Paths

The current Red Hat supported upgrade paths require administrative access to a system. While it's aimed primarily at scientific users, the comprehensive Anaconda Python distribution from Continuum Analytics is a good way to obtain prebuilt versions of Python for Red Hat Enterprise Linux that can be installed by individual users without administrative access.

At Strata 2015, Continuum's Python distribution not only featured in a combined announcement regarding on-premise deployment of Anaconda Cluster together with Red Hat Storage, but also in Microsoft's announcement of Python support in the Azure Machine Learning service

For users that don't need a full scientific Python distribution, Continuum Analytics also offer miniconda which just provides a Python runtime and the conda package manager, providing end users with a cross-platform way to obtain and manage multiple Python runtimes without needing administrative access to their systems.

Community Supported Python Upgrade Paths

The question of providing upgrade paths for folks without an active Red Hat subscription has also been taken into consideration.

In January 2014 Red Hat became an official sponsor of the long established CentOS project, with the aim of providing a stable base for community open source innovation above the operating system layer (this aim contrasts with the aims of the Fedora project, which is intended primarily to drive innovation within the operating system layer itself).

CentOS 7 was originally released in July 2014, and the latest release, CentOS 7(1503), was just published (as the name suggests) in March 2015).

For CentOS and other RHEL derivatives, the upstream project for Red Hat Software Collections is hosted at, making these collections available to the whole Fedora/RHEL/CentOS ecosystem, rather than only being available to Red Hat subscribers.

Folks running Fedora Rawhide that are particularly keen to be on the cutting edge of Python can even obtain prerelease Python 3.5 nightly builds as a software collection from Miro Hrončok's Fedora COPR repository.

In addition to maintaining Fedora itself, the Fedora community also maintains the Extra Packages for Enterprise Linux (EPEL) repositories, providing ready access to non-conflicting packages beyond those in the set included in the base Red Hat Enterprise Linx and CentOS releases.

Providing Commercial Red Hat Enterprise Linux Support

If projects are regularly receiving requests for support on Red Hat Enterprise Linux and derived platforms, and the developers involved are actively looking to build a sustainable business around their software, then a steady stream of these requests may represent an opportunity worth exploring. After all, Red Hat's subscribers all appreciate the value that deploying commercially supported open source software can bring to an organisation, and are necessarily familiar with the use of software subscriptions to support sustaining engineering and the ongoing development of new features for the open source software that they deploy.

One of the key things that many customers are looking for is pre-release integration testing to provide some level of assurance that the software they deploy will work in their environment, while another is a secure development and distribution pipeline that ensures that the software they install is coming from organisations that they trust.

One of Red Hat's essential tools for managing this distributed integration testing and content assurance effort is its Partner Program. This program is designed to not only assist Red Hat subscribers in finding supported software that meets their needs, but also to provide Red Hat, partners, and customers with confidence that the components of deployed solutions will work well together in target deployment environments.

Specifically for web service developers that would like to provide a supported on-premise offering, last year's announcement of a Container Certification Program (with Red Hat Enterprise Linux 7 and the OpenShift Platform-as-a-Service offering as certified container hosts) extended Red Hat's certification programs to cover the certification of Docker containers in addition to other forms of Linux application deployment.

Even more recently, the Red Hat Container Development Kit was introduced to help streamline that certification process for Red Hat Independent Software Vendor Partners.

But these are all things that folks should only explore if they're specifically interested in building a commercial support business around their software. If users are trying to get long term maintenance support for a community project for free, then upstream developers should be sending a single unified message in response: don't assume you'll be able to run new versions of open source software on old platforms unless you're specifically paying someone to ensure that happens.

I'm not saying this because I work for a platform vendor that gets paid (at least in part) to do this, I'm saying it because most open source projects are maintained by innovators that are upgrading their technology stacks regularly, where versions of components are already old after 2 years and truly ancient after 5. Expecting open source innovators to provide long term maintenance for free is simply unreasonable - regularly upgrading their own stacks means they don't need this long term platform support for themselves, so the folks that are seeking it should be expected to pay for it to happen. (Apparent exceptions like CentOS aren't exceptions at all: sustaining engineering on CentOS is instead a beneficial community byproduct of the paid sustaining engineering that goes into Red Hat Enterprise Linux).

Abusing Contributors is not OK

As reported in Ars Technica, the ongoing efforts to promote diversity in open source communities came up once more during the plenary Q&A session with Linus Torvalds, Andrew Tridgell, Bdale Garbee and Rusty Russell.

I was there for that session, and found that Linus's response appeared to betray a fundamental misunderstanding of the motives of many of the folks pushing for increased diversity in the open source community, as well as a lack of awareness of the terrible situations that can arise when leaders in a community regularly demonstrate abusive behaviour without suffering any significant consequences (just ask folks like Kathy Sierra, Zoe Quinn, Anita Sarkeesian and Brianna Wu that have been subjected to sustained campaigns of harassment largely for being women that dared to have and express an opinion on the internet).

As the coordinator of the Python Software Foundation's contribution to the 2015 financial assistance program, and as someone with a deep personal interest in the overall success of the open source community, I feel it is important for me to state explicitly that I consider Linus's level of ignorance around appropriate standards of community conduct to be unacceptable in an open source community leader in 2015.

Linus's defence of his abusive behaviour is that he's "not nice", and "doesn't care about you". He does care deeply about his project, though, and claims to be motivated primarily by wanting that to continue to be successful.

To be completely honest, the momentum behind the Linux juggernaut is now large enough that Linus could likely decide to chuck it all in and spend the rest of his life on a beach sipping cocktails without worrying about open source politics, and people would figure out a way to ensure that Linux continued to thrive and grow without him. Many a successful start-up has made that transition when the founders leave, and there's no good reason to believe an open source community would be fundamentally different in that regard. The transition to a new leadership structure might be a little messy, but the community would almost certainly figure it out.

However, there's still a lot of scope for Linus to influence how fast Linux grows, and on that front his words and actions suggest that he considers being careless in his speech, without regard for the collateral damage his verbal broadsides may be doing to his cause, more important than having the most positive impact he is capable of having on the future growth of the Linux kernel development project and the open source community at large.

It's not (necessarily) about being nice

It may surprise some folks to learn that I don't consider myself a nice human either. My temper is formidable (I just keep it under control most of the time, a task online communication makes easier by providing the ability to walk away from the computer for a while), and any feelings of compassion I have for others are more a product of years of deliberate practice and hanging around with compassionate people than they are any particularly strong innate knack for empathy.

I'm pretty sure that genuinely nice people do exist, and I assume that one of their key motives for creating open, welcoming, inclusive communities is because it's fundamentally the right thing to do. The main reason I exclude myself from my assumed category of "nice people" is that, while I acknowledge that motivation intellectually, it's not really one that I feel viscerally.

Instead, what I do care about, passionately, is helping the best ideas win (where I include "feasible" as part of my definition of "best"). Not the "best ideas from people willing to tolerate extensive personal abuse". The best ideas anyone is willing to share with me, period. And I won't hear those ideas unless I help create environments where all participants are willing to speak up, not just those that are prepared to accept a blistering verbal barrage from a powerful authority figure as a possible consequence of attempting to participate. Those are upsetting enough when they come from random strangers on the internet, when they come from someone with enormous influence not only over you and your future career, but also your entire industry, they can be devastating.

The second order consequences

So there you have it, my not-nice reason for advocating for more welcoming and inclusive open source communities: because, from an engineering standpoint, I consider "has a high level of tolerance for receiving personal abuse from community leaders" to be an extraordinarily stupid filter to apply to your pool of potential contributors.

Exhibiting abusive behaviour as a leader has additional consequences though, and they can be even more problematic: by regularly demonstrating abusive behaviour yourself, you end up normalising harassment within your community in general, both in public and in private.

I believe Linus when he says he doesn't care about who people are or where they're from, only their contributions. I'm the same way - until I've known them for a while, I tend to care about contributors and potential contributors wholesale (i.e. happy people that enjoy the environment they're participating in tend to spend more time engaged, learn faster, produce superior contributions, and more quickly reach the point of being able to contribute independently), rather than retail (i.e. I care about my friends because they're my friends, regardless of context).

But when you're personally abusive as a leader, you also have to take a high level of responsibility for all the folks that look up to you as a role model, and act out the same behaviours you exhibit in public. When you reach this point, the preconditions for participation in your community now include:

  • Willing to tolerate public personal abuse from project leaders
  • Willing to tolerate public personal abuse from the community at large
  • Willing to tolerate personal abuse in private

With clauses like that as part of the definition, the claim of "meritocracy" starts to feel a little shaky, doesn't it? Meritocracy is a fine ideal to strive for, but claiming to have achieved it when you're imposing irrelevant requirements like this is arrogant nonsense.

We're not done yet, though, as this culture of abuse then combines with elitism based on previously acquired knowledge to make it normal to abuse newcomers for still being in the process of learning. I find it hard to conceive of a more effective approach to keeping people from adopting something you're giving away for free than tolerating a community that publicly abuses people for not magically already knowing how to use technology that they may have never even heard of before.

As a result of this perspective, the only time I'll endeavour to eject anyone from a community where I have significant influence is when they're actively creating an unpleasant environment for other participants, and demonstrate no remorse whatsoever regarding the negative impact their actions are having on the overall collaborative effort. I count myself incredibly fortunate to have only had to do this a few times in my life so far, but it's something I believe in strongly enough for it to have been the basis for once deciding to resign from a position paying a six-figure salary at a company I otherwise loved working for. To that company's credit, the abusive leader was let go not long afterwards, but the whole secretive corporate system is rigged such that these toxic "leaders" can usually quickly find new positions elsewhere and hence new subordinates to make miserable - the fact that I'm not willing to name names here for fear of the professional consequences is just one more example of how the system is largely set up to protect abusive leaders rather than holding them to account for the impact their actions have on others.

Ideas and code are still fair game

One of the spurious fears raised against the apparently radical notion of refusing to tolerate personal abuse in a collaborative environment is that adopting civil communication practices somehow means that bad code must then be accepted into the project.

Eliminating personal abuse doesn't mean eliminating rigorous critique of code and ideas. It just means making sure that you are critiquing the code and the ideas, rather than tearing down the person contributing them. It's the difference between "This code isn't any good, here are the problems with it, I'm confident you can do better on your next attempt" (last part optional but usually beneficial when it comes to growing your contributor community) and "This code is terrible, how dare you befoul my presence with it, begone from my sight, worm!".

The latter response may be a funny joke if done in private between close friends, but when it's done in public, in front of a large number of onlookers who don't know either the sender or the recipient personally, it sets an astoundingly bad example as to what a mutually beneficial collaborative peer relationship should look like.

And if you don't have the self-discipline needed to cope with the changing context of our online interactions in the open source community? Well, perhaps you don't yet have the temperament needed to be an open source leader on an internet that is no longer the sole preserve of those of us that are more interested in computers than we are in people. Most of the technical and business press have yet to figure out that they can actually do a bit of investigative journalism to see how well vendor rhetoric aligns with upstream open source engineering activity (frequency of publication is still a far more important performance metric for most journalists than questioning the spin served up in corporate press releases), so the number of folks peering into the open source development fishbowl is only going to grow over time.

It isn't that hard to learn the necessary self-control, though. It's mostly just a matter of taking the time to read each email or code review comment, look for the parts that are about the contributor rather than the code or the design, and remove them before hitting send. And if that means there's nothing left? Then what you were about to send was pure noise, adding nothing useful to the conversation, and hence best left unsent. Doing anything less than this as a community leader is pure self-indulgence, putting your own unwillingness to consider the consequences of your communications ahead of the long term interests of your project. We're only talking about software here, after all - lives aren't on the line when we're deciding how to respond to a particular contribution, so we can afford to take a few moments to review the messages we're about to send and consider how they're likely to be perceived, both by the recipient, and by everyone else observing the exchange.

With any personal abuse removed, you can be as ruthless about critiquing the code and design as you like. Learning not to take critiques of your work personally is a necessary skill to acquire if your ambition is to become a high profile open source developer - the compromises often necessary in the real world of software design and development mean that you will end up shipping things that can legitimately be described as terrible, and you're going to have to learn to be able to say "Yes, I know it's terrible, for reasons X, Y, and Z, and I decided to publish it anyway. If you don't like it, don't use it.". (I highly recommend giving talks about these areas you know are terrible - they're fun to prepare, fun to give, and it's quite entertaining seeing the horrified reactions when people realise I'm not kidding when I say all software is terrible and computers don't actually work, they just fake it fairly well through an ongoing series of horrible hacks built atop other horrible hacks. I'm not surprised the Internet breaks sometimes - given the decades of accumulated legacy hardware and software we're building on and working around, it's thoroughly astonishing that anything technology related ever works at all)

But no matter how harsh your technical critiques get, never forget that there's at least one other human on the far end of that code review or email thread. Even if you don't personally care about them, do you really think it's a good idea to go through life providing large numbers of people with public evidence of why you are a thoroughly unpleasant person to be forced to deal with? As a project leader, do you really think you're going to attract the best and brightest people, who are often free to spend their time and energy however they like, if you set up a sign saying "You must be willing to tolerate extensive personal abuse in order to participate here"?

What can we do about it?

First, and foremost, for those of us that are paid open source community leaders, we can recognise that understanding and motivating our contributors and potential contributors in order to grow our communities is part of our job. If we don't like that, if we'd prefer to be able to "just focus on the code", to the degree where we're not willing to learn how to moderate our own behaviour in accordance with our level of responsibility, then we need to figure out how to reorganise things such that there's someone with better people management and communication skills in a position to act as a buffer between us and our respective communities.

If we instead decide we need to better educate ourselves, then there are plenty of resources available for us to do so. For folks just beginning to explore questions of systemic bias and defaulting to exclusivity, gender-based bias is a good one to start with, by perusing resources like the Feminism 101 section on the Geek Feminism wiki, or (if we have the opportunity) attending an Ada Initiative Ally Skills workshop.

And if we do acknowledge the importance of this work, then we can use our influence to help it continue, whether that's by sponsoring educational workshops, supporting financial assistance programs, ensuring suitable codes of conduct are in place for our events and online communities, supporting programs like the GNOME Outreach Program for Women, or organisations like the Ada Initiative, and so on, and so forth.

For those of us that aren't community leaders, then one of the most effective things we can do is vote with our feet: at last count, there are over a million open source projects in existence, many of them are run in such a way that participating in them is almost always a sheer pleasure, and if no existing community grabs your interest, you always have the option of starting your own.

Personal enjoyment is only one reason for participating in open source though, and professional obligations or personal needs may bring us into contact with project leaders and contributors that currently consider personal abuse to be an acceptable way of interacting with their peers in a collaborative context. If leaving isn't a viable option, then what can we do?

Firstly, the options I suggest above for community leaders are actually good options for any participants in the open source community that view the overall growth and success of the free and open source software ethos as being more important than any one individual's personal pride or reluctance to educate themselves about issues that don't affect them personally.

Secondly, we can hold our leaders to account. When community leaders give presentations at community events, especially when presenting on community management topics, feel free to ask the following questions (or variations on these themes):

  • Are we as community leaders aware of the impact current and historical structural inequalities have on the demographics of our community?
  • What have we done recently as individuals to improve our understanding of these issues and their consequences?
  • What have we done recently as community leaders to attempt to counter the systemic biases adversely affecting the demographics of our communities?

These are questions that open source community leaders should be able to answer. When we can't, I guess the silver lining is that it means we have plenty of scope to get better at what we do. For members of vulnerable groups, an inability for leaders to answer these questions is also a strong sign as to which communities may not yet be able to provide safe spaces for you to participate without experiencing harassment over your identity rather than being critiqued solely based on the quality of your work.

If you ask these questions, you will get people complaining about bringing politics into a technical environment. The folks complaining are simply wrong, as the single most important factor driving the quality of our technology is the quality of our thinking. Assuming we have attained meritocracy (aka "the most effective collaborative environment possible") is sheer foolishness, when a wide array of systemic biases remain in place that work to reduce the depth, breadth, and hence quality, of our collective thinking.

Update 22 Jan, 2015: Minor typo and grammar fixes

DTCA Public Consultation - Brisbane

Over the weekend, Asher Wolf alerted me (and many others in the open source and cryptographic communities) to the Australian Defence Trade Controls Act 2012, and the current public consultation taking place around a bill proposing amendments to that act.

Being heavily involved in improving the security of open source infrastructure like the Python Package Index and the Python 2 reference interpreter, working at a multinational open source vendor, and having an extensive background in working under the constraints of the US International Traffic in Arms regulations, Asher's concern caught my attention, since bad legislation in this area can have significant chilling effects on legitimate research and development activities.

As a result, I've escalated this legislation for review by the legal teams at various open source companies and organisations, with a view to making formal submissions to the public consultation process that is open until January 30th (ready for bills to be submitted for consideration to federal parliament on February 23rd).

However, I was also able to attend the first public consultation session held at the University of Queensland on January 19, so these are my impressions based primarily on that sessions and my own experiences dealing with ITAR. I'm not a lawyer and I haven't actually read the legislation, so I'm not going to pick up on any drafting errors, but I can at least speak to the intent of the folks involved in moving this process forward.

What not to worry about

To folks encountering this kind of legislation for the first time, the sheer scope of the Defence and Strategic Goods List can seem absolutely terrifying. This was very clear to me from some of the questions various academics in the room were asking.

On this particular point, I can only say: "Don't panic". This isn't a unique-to-Australia list, it's backed by a treaty called the Wassenaar Arrangement - the DSGL represents part of the implementation of that arrangement into Australian law.

When the laws implementing that arrangement are well drafted, everyone outside the military industrial complex (and certain easily weaponised areas of scientific research) can pretty much ignore them, while everyone inside the military industrial complex (and the affected areas of research) pays very close attention to them because we like not being in jail (and because gunrunning is bad, and bioterrorism is worse, mmm'kay?).

A heavily regulated military supply chain is already scary enough, we really don't want to see the likely consequences of an unregulated one. (And if you're tempted to make a snarky comment about the latter already being the case, no, it really isn't. While folks can sometimes use overclassification to avoid regulations they're supposed to be following, that still introduces significant friction and inefficiencies into whatever they're doing. It's not as good as people actually respecting the laws of the countries they're supposedly defending, including genuinely meeting the requirement for civilian authority over the military, but it's still a hell of a lot better than nothing).

Getting back on topic, the US ITAR and crypto export control laws are currently considered the most strict implementation of the Wassenaar Arrangement amongst the participating nations (going beyond the requirements of the treaty in several areas), so if you see plenty of US nationals participating in an activity without being fined and going to jail, you can be fairly confident that it isn't actually a controlled activity under the DSGL (or, even if it is, permits for that specific activity will be fairly easy to get, and the most likely consequence of not realising you need a permit for something you're doing will be someone from your government getting in touch to point out that you should apply for one).

There are certainly some very questionable aspects of this list (with the perennial "favourite" being the fact the Wassenaar Arrangement does, in fact, attempt to regulate the global trade in mathematics, which is just as stupid and problematic as it sounds), but it's a known quantity, and one we're pretty sure we can continue to live with (at least for the time being).

What to worry about

The real problem here is that the regulations included in the 2012 Act are not well drafted, and the legislated 2 year transition period from May 2013 through to May 2015 prior to the enforcement provisions kicking in is about to run out.

The biggest problem with the 2012 act is that in trying to keep things simple (essentially, "if its on the DSGL, you need a permit"), it ended up becoming extraordinarily draconian, requiring a permit for things that don't require an export license even under ITAR.

For the general public, the most significant shift in the 2015 amendment bill is the fact that several cases around open publication of information related to dual-use technologies shift to being allowed by default, and only in exceptional cases would a permit be required (and in those cases, the onus would be on the government to inform the covered individuals of that requirement).

The amendments also include a variety of additional exemptions for little things like making it legal for Australian's own police and security agencies to collaborate with their international counterparts. (Snarky comment opportunity #2: in certain areas, making such collaboration illegal seems like a potentially attractive idea...)

That 2 year pilot was included in the original legislation as a safety mechanism, the feedback from the associated steering group has been extensive, and if things had gone according to plan, the relevant amendments to the bill would have been passed last year in the spring sitting of federal parliament, leaving DECO with at least 6 months to educate affected organisations and individuals, and start issuing the now necessary permits before the enforcement provisions became active in May. Unfortunately, we currently have a federal government that views pushing a particular ideological agenda as being more important than actually doing their job, so we're now faced with the prospect of regulations that industry doesn't want, academia doesn't want, the Australian public service don't want, and the Australian military don't want, coming into effect anyway.

Isn't politics fun?

What DECO are (trying) to do about it

The group tasked with untangling this particular legislative Charlie Foxtrot is the Australian Defence Export Control Office (DECO). Their proposal for addressing the situation hinges on two bills that they plan to put before the next sitting of federal parliament:

  • an amendment bill for the Act itself, which fixes it to be a conventional implementation of the Wassenaar Arrangement, in line with existing implementations in other Wassenaar nations (why we didn't just do that in the first place is beyond me, but at least DECO are trying to fix the mistake now)
  • a second bill to delay the enactment of the enforcement provisions for a further six months to provide sufficient time for DECO to properly educate affected parties and start issuing permits

As far as I am aware, the second bill is needed primarily due to the consideration of the first bill slipping by six months, since we're now looking at the prospect of only having 4 weeks for DECO to start issuing permits before the enforcement provisions come into effect. Nobody involved thinks that's a good idea.

If both of those bills pass promptly, then the only cause for concern is whether or not there are any remaining devils in the details of the legislation itself. Member of the general public aren't going to be able to pick those up - despite the surface similarities, legalese isn't English, and reading it without interpreting it in the context of relevant case law can be a good way to get yourself into trouble. Summary translations from legalese to English by a competent lawyer are a much safer bet, although still not perfect. (For the programmers reading this: I personally find it useful to think of legalese as source code that runs on the language interpreter of a given nation's legal system, while the English translations are the code comments and documentation that anyone should be able to read if they understand the general concepts involved).

If at least the second bill passes, then we have another 6 months to work on a better resolution to the problem.

If neither bill passes, then DECO end up in a bad situation where they'll be required by law to implement and enforce regulations that they're convinced are a bad idea. They actually have everything in place to do that if they have to, but they don't want this outcome, and neither does anyone else.

What industry and academia can do about it

While it's very short notice, the main thing industry and academia can do is to file formal submissions with DECO as described in their overview of the public consultation process.

There are three main things to be addressed on that front:

  • ensuring federal parliament are aware of the importance of amending the Defence Trade Controls Act 2012 to eliminate the more draconian provisions
  • ensuring federal parliament are aware of the infeasibility of putting this into effect on the original timeline and the need for a significant delay in the introduction of the enforcement provisions
  • ensuring DECO are alerted to any remaining areas of concern in the specific drafting of the amended legislation (although I'd advise skipping this one if you're not a lawyer yourself - it's the functional equivalent of a lawyer with no training as a programmer proposing patches to the Linux kernel)

We were apparently asleep at the wheel when DTCA went through in 2012, so we owe a lot of thanks to whoever it was that advocated for and achieved the inclusion of the two year transition and consultation period in the original bill. Now we need to help ensure that our currently somewhat dysfunctional federal parliament doesn't keep us from receiving the benefit of that foresight.

What's definitely not going to happen

This consultation process is not the place to rail against the details of the Wassenaar Arrangement or Australia's participation in it. You won't achieve anything except to waste the time of folks that currently have a really serious problem to fix, and a very limited window in which to fix it.

Yes, Wassenaar has some serious problems, especially around its handling of cryptography and cryptographic research, but we have a fairly settled approach to handling that at this point in history. The critical concern in this current case is to help DECO ensure that the associated Australian regulations can be readily handled through the mechanisms that have already been put in place to handle existing Wassenaar enforcement regimes in other countries. With the way the 2012 Act was drafted, that's almost certainly currently not the case, but the proposed 2015 amendments should fix it (assuming the amendments actually have the effects that DECO has indicated they're intended to).

Running Kallithea on OpenShift

Kallithea for CPython

The CPython core development team are currently evaluating our options for modernising our core development workflows to better match the standards set by other projects and services like OpenStack and GitHub.

The first step in my own proposal for that is to migrate a number of the support repositories currently hosted using a basic Mercurial server on to an instance of Kallithea hosted as (Kallithea is a GPLv3 Python project that was forked from RhodeCode after certain aspects of the latter's commercialisation efforts started alienating several members of their user and developer community)

Tymoteusz Jankowski (a contributor to Allegro Group's open source data centre inventory management system, Ralph), has already started looking at the steps that might be involved in integrating a Kallithea instance into the PSF's Salt based infrastructure automation.

However, for my proposal to be as successful as I would like it to be, I need the barriers to entry for the development and deployment of the upstream Kallithea project itself to be as low as possible. One of the challenges we've often had with gaining contributors to CPython infrastructure maintenance is the relatively high barriers to entry for trying out service changes and sharing them with others, so this time I plan to tackle that concern first, by ensuring that addressing it is a mandatory requirement in my proposal.

That means tackling two particular problems:

  • Having a way to easily run local test instances for development and experimentation
  • Having a way to easily share demonstration instances with others

For the first problem, I plan to rely on Vagrant and Docker, while for the second I'll be relying on the free tier in Red Hat's OpenShift Online service. Unfortunately, while the next generation of OpenShift will support Docker images natively, for the time being, I need to tackle these as two separate problems, as there aren't any existing Docker based services I'm aware of with a free tier that is similarly suited to the task of sharing development prototypes for open source web services with a broad audience (let alone any such services that are also fully open source).

Once I have these working to my satisfaction, I'll propose them to the Kallithea team for inclusion in the Kallithea developer documentation, but in the meantime I'll just document them here on the blog.

Enabling Kallithea deployment on OpenShift

My first priority is to get a public demonstration instance up and running that I can start tweaking towards the CPython core development community's needs (e.g. installing the custom repo hooks we run on, so I'm starting by figuring out the OpenShift setup needed to run public instances - the Vagrant/Docker based setup for local development will come later.

Conveniently, WorldLine previously created an OpenShift quickstart for RhodeCode and published it under the Apache License 2.0, so I was able to use that as a starting point for my own Kallithea quickstart.

While I personally prefer to run Python web services under mod_wsgi in order to take advantage of Apache's authentication & authorisation plugin ecosystem, that's not a significant concern for the demonstration server use case I have in mind here. There are also some other aspects in the WorldLine quickstart I'd like to understand better and potentially change (like figuring out a better way of installing git that doesn't involve hardcoding a particular version), but again, not a big deal for demonstration instances - rather than worrying about them too much, I just annotated them as TODO comments in the OpenShift hook source code.

I'd also prefer to be running under the official Python 2.7 cartridge rather than a DIY cartridge, but again, my focus at this point is on getting something up and running, and then iterating from there to improve it.

That meant adapting the quickstart from RhodeCode to Kallithea was mostly just a matter of changing the names of the various components being installed and invoked, together with changing the actual installation and upgrade steps to be based on Kallithea's deployment instructions.

The keys to this are the build hook and the start hook. The OpenShift docs have more details on the various available action hooks and when they're run.

In addition to the TODO comments noted above, I also added various comments explaining what different parts of the action hook scripts were doing.

(Note: I haven't actually tested an upgrade, only the initial deployment described below, so I can't be sure I have actually adapted the upgrade handling correctly yet)

Deploying my own Kallithea instance

I already have an OpenShift account, so I could skip that step, and just create a new app under my existing account. However, I didn't have the command line tools installed, so that was the first step in creating my own instance:

sudo yum install /usr/bin/rhc

yum is able to figure out on my behalf that it is rubygems-rhc that provides the command line tools for OpenShift in Fedora (alternatively, I could have looked that up myself in the OpenShift client tools installation docs).

The next step was to configure the command line tools to use my OpenShift Online account, generate a local login token for this machine, and upload my public SSH key to OpenShift Online. That process involved working through the interactive prompts in:

rhc setup

With those preliminary OpenShift steps out of the way, it was time to move on to deploying the application itself. It's worth noting that app creation automatically clones a local git repo named after the application, so I created a separate "app_repos" subdirectory in my development directory specifically so I could call my OpenShift app "kallithea" without conflicting with my local clone of the main kallithea repo.

As described in the quickstart README, the app creation command is:

rhc app create kallithea diy-0.1 postgresql-9.2

That churned away for a while, and then attempted to clone the app repo locally over ssh (with SSH putting up a prompt to accept the validity of the app's freshly generated SSH key). I'm not sure why, but for some reason that automatic clone operation didn't work for me. rhc put up a detailed message explaining that the app creation had worked, but the clone step had failed. Fortunately, as the troubleshooting notice suggested, a subsequent rhc git-clone kallithea worked as expected.

OpenShift provides a default app skeleton automatically, but I actually want to get rid of that and replace it with the contents of the quickstart repo:

rm -R diy .openshift misc
git add .
git commit -m "Remove template files"
git remote add quickstart -m master
git pull -s recursive -X theirs quickstart master

The default merge commit message that popped up was fine, so I just accepted that and moved on to the most interesting step:

git push

Because this is the first build, there's a lot of output related to installing and building the PostgreSQL driver and git, before moving on to installing Kallithea and its dependencies.

However, that still didn't take long, and completed without errors, so I now have my own Kallithea instance up and running.

And no, the default admin credentials created by the quickstart won't work anymore - I immediately logged in to the admin account to change them!

Where to from here?

There are various aspects of the current quickstart that are far from ideal, but I don't plan to spend a lot of time worrying about it when I know that support for using Docker images directly in OpenShift is coming at some point in the not too distant future.

One of the key advantages of Docker is the much nicer approach it offers to layered application development where infrastructure experts can provide base images for others to build on, and in the case of deploying Python applications with mod_wsgi, that means listening to Graham Dumpleton (the author of mod_wsgi, currently working for New Relic).

On that front, Graham has actually been working on creating a set of Debian based mod_wsgi Docker images that Python developers can use, rather than having to build their own from scratch.

In my case, I'd really prefer something based on CentOS 7 or Fedora Cloud, but that's a relatively minor quibble, and Graham's images should still make a great basis for putting together a Vagrant+Docker based local workflow for folks working on Kallithea.

That, however, is a topic for a future post :)

Seven billion seconds per second

A couple of years ago, YouTube put together their "One hour per second" site, visualising the fact that for every second of time that elapses, an hour of video is uploaded to YouTube. Their current statistics page indicates that figure is now up to 100 hours per minute (about 1.7 hours per second).

Impressive numbers to be sure. However, there's another set of numbers I personally consider significantly more impressive: every second, more than seven billion seconds are added to the tally of collective human existence on Earth.

Think about that for a moment.

Tick. Another 7 billion seconds of collective human existence.

Tick. Another 117 million minutes of collective human existence.

Tick. Another 2 million hours of collective human existence.

Tick. Another 81 thousand days of collective human existence.

Tick. Another 11 thousand weeks of collective human existence.

Tick. Another 222 years of collective human existence.

222 years of collective human experience, every single second, of every single day. And as the world population grows, it's only going to get faster.

222 years of collective human experience per second.

13 centuries per minute.

801 centuries per hour.

19 millenia per day.

135 millenia per week.

7 billion years per year.

The growth in our collective human experience over the course of a single year would stretch halfway back to the dawn of time if it was experienced by an individual.

We currently squander most of that potential. We allow a lot of it to be wasted scrabbling for the basic means of survival like food, clean water and shelter. We lock knowledge up behind closed doors, forcing people to reinvent solutions to already solved problems because they can't afford the entry fee.

We ascribe value to people based solely on their success in the resource acquisition game that is the market economy, without acknowledging the large degree to which sheer random chance is frequently the determinant in who wins and who loses.

We inflict bile and hate on people who have the temerity to say "I'm here, I'm human, and I have a right to be heard", while being different from us. We often focus on those superficial differences, rather than our underlying common humanity.

We fight turf wars based on where we were born, the colour of our skin, and which supernatural beings or economic doctrines we allow to guide our actions.

Is it possible to change this? Is it possible to build a world where we consider people to have inherent value just because they're fellow humans, rather than because of specific things they have done, or specific roles they take up?

I honestly don't know, but it seems worthwhile to try. I certainly find it hard to conceive of a better possible way to spend my own meagre slice of those seven billion seconds per second :)

The transition to multilingual programming

A recent thread on python-dev prompted me to summarise the current state of the ongoing industry wide transition from bilingual to multilingual programming as it relates to Python's cross-platform support. It also relates to the reasons why Python 3 turned out to be more disruptive than the core development team initially expected.

A good starting point for anyone interested in exploring this topic further is the "Origin and development" section of the Wikipedia article on Unicode, but I'll hit the key points below.

Monolingual computing

At their core, computers only understand single bits. Everything above that is based on conventions that ascribe higher level meanings to particular sequences of bits. One particular important set of conventions for communicating between humans and computers are "text encodings": conventions that map particular sequences of bits to text in the actual languages humans read and write.

One of the oldest encodings still in common use is ASCII (which stands for "American Standard Code for Information Interchange"), developed during the 1960's (it just had its 50th birthday in 2013). This encoding maps the letters of the English alphabet (in both upper and lower case), the decimal digits, various punctuation characters and some additional "control codes" to the 128 numbers that can be encoded as a 7-bit sequence.

Many computer systems today still only work correctly with English - when you encounter such a system, it's a fairly good bet that either the system itself, or something it depends on, is limited to working with ASCII text. (If you're really unlucky, you might even get to work with modal 5-bit encodings like ITA-2, as I have. The legacy of the telegraph lives on!)

Working with local languages

The first attempts at dealing with this limitation of ASCII simply assigned meanings to the full range of 8-bit sequences. Known collectively as "Extended ASCII", each of these systems allowed for an additional 128 characters, which was enough to handle many European and Cyrillic scripts. Even 256 characters was nowhere near sufficient to deal with Indic or East Asian languages, however, so this time also saw a proliferation of ASCII incompatible encodings like ShiftJIS, ISO-2022 and Big5. This is why Python ships with support for dozens of codecs from around the world.

This proliferation of encodings required a way to tell software which encoding should be used to read the data. For protocols that were originally designed for communication between computers, agreeing on a common text encoding is usually handled as part of the protocol. In cases where no encoding information is supplied (or to handle cases where there is a mismatch between the claimed encoding and the actual encoding), then applications may make use of "encoding detection" algorithms, like those provided by the chardet package for Python. These algorithms aren't perfect, but can give good answers when given a sufficient amount of data to work with.

Local operating system interfaces, however, are a different story. Not only don't they inherently convey encoding information, but the nature of the problem is such that trying to use encoding detection isn't practical. Two key systems arose in an attempt to deal with this problem:

  • Windows code pages
  • POSIX locale encodings

With both of these systems, a program would pick a code page or locale, and use the corresponding text encoding to decide how to interpret text for display to the user or combination with other text. This may include deciding how to display information about the contents of the computer itself (like listing the files in a directory).

The fundamental premise of these two systems is that the computer only needs to speak the language of its immediate users. So, while the computer is theoretically capable of communicating in any language, it can effectively only communicate with humans in one language at a time. All of the data a given application was working with would need to be in a consistent encoding, or the result would be uninterpretable nonsense, something the Japanese (and eventually everyone else) came to call mojibake.

It isn't a coincidence that the name for this concept came from an Asian country: the encoding problems encountered there make the issues encountered with European and Cyrillic languages look trivial by comparison.

Unfortunately, this "bilingual computing" approach (so called because the computer could generally handle English in addition to the local language) causes some serious problems once you consider communicating between computers. While some of those problems were specific to network protocols, there are some more serious ones that arise when dealing with nominally "local" interfaces:

  • networked computing meant one username might be used across multiple systems, including different operating systems
  • network drives allow a single file server to be accessed from multiple clients, including different operating systems
  • portable media (like DVDs and USB keys) allow the same filesystem to be accessed from multiple devices at different points in time
  • data synchronisation services like Dropbox need to faithfully replicate a filesystem hierarchy not only across different desktop environments, but also to mobile devices

For these protocols that were originally designed only for local interoperability communicating encoding information is generally difficult, and it doesn't necessarily match the claimed encoding of the platform you're running on.

Unicode and the rise of multilingual computing

The path to addressing the fundamental limitations of bilingual computing actually started more than 25 years ago, back in the late 1980's. An initial draft proposal for a 16-bit "universal encoding" was released in 1988, the Unicode Consortium was formed in early 1991 and the first volume of the first version of Unicode was published later that same year.

Microsoft added new text handling and operating system APIs to Windows based on the 16-bit C level wchar_t type, and Sun also adopted Unicode as part of the core design of Java's approach to handling text.

However, there was a problem. The original Unicode design had decided that "16 bits ought to be enough for anybody" by restricting their target to only modern scripts, and only frequently used characters within those scripts. However, when you look at the "rarely used" Kanji and Han characters for Japanese and Chinese, you find that they include many characters that are regularly used for the names of people and places - they're just largely restricted to proper nouns, and so won't show up in a normal vocabulary search. So Unicode 2.0 was defined in 1996, expanding the system out to a maximum of 21 bits per code point (using up to 32 bits per code point for storage).

As a result, Windows (including the CLR) and Java now use the little-endian variant of UTF-16 to allow their text APIs to handle arbitrary Unicode code points. The original 16-bit code space is now referred to as the Basic Multilingual Plane.

While all that was going on, the POSIX world ended up adopting a different strategy for migrating to full Unicode support: attempting to standardise on the ASCII compatible UTF-8 text encoding.

The choice between using UTF-8 and UTF-16-LE as the preferred local text encoding involves some complicated trade-offs, and that's reflected in the fact that they have ended up being at the heart of two competing approaches to multilingual computing.

Choosing UTF-8 aims to treat formatting text for communication with the user as "just a display issue". It's a low impact design that will "just work" for a lot of software, but it comes at a price:

  • because encoding consistency checks are mostly avoided, data in different encodings may be freely concatenated and passed on to other applications. Such data is typically not usable by the receiving application.
  • for interfaces without encoding information available, it is often necessary to assume an appropriate encoding in order to display information to the user, or to transform it to a different encoding for communication with another system that may not share the local system's encoding assumptions. These assumptions may not be correct, but won't necessarily cause an error - the data may just be silently misinterpreted as something other than what was originally intended.
  • because data is generally decoded far from where it was introduced, it can be difficult to discover the origin of encoding errors.
  • as a variable width encoding, it is more difficult to develop efficient string manipulation algorithms for UTF-8. Algorithms originally designed for fixed width encodings will no longer work.
  • as a specific instance of the previous point, it isn't possible to split UTF-8 encoded text at arbitrary locations. Care needs to be taken to ensure splits only occur at code point boundaries.

UTF-16-LE shares the last two problem, but to a lesser degree (simply due to the fact most commonly used code points are in the 16-bit Basic Multilingual Plane). However, because it isn't generally suitable for use in network protocols and file formats (without significant additional encoding markers), the explicit decoding and encoding required encourages designs with a clear separation between binary data (including encoded text) and decoded text data.

Through the lens of Python

Python and Unicode were born on opposites side of the Atlantic ocean at roughly the same time (1991). The growing adoption of Unicode within the computing industry has had a profound impact on the evolution of the language.

Python 1.x was purely a product of the bilingual computing era - it had no support for Unicode based text handling at all, and was hence largely limited to 8-bit ASCII compatible encodings for text processing.

Python 2.x was still primarily a product of the bilingual era, but added multilingual support as an optional addon, in the form of the unicode type and support for a wide variety of text encodings. PEP 100 goes into the many technical details that needed to be covered in order to incorporate that feature. With Python 2, you can make multilingual programming work, but it requires an active decision on the part of the application developer, or at least that they follow the guidelines of a framework that handles the problem on their behalf.

By contrast, Python 3.x is designed to be a native denizen of the multilingual computing world. Support for multiple languages extends as far as the variable naming system, such that languages other than English become almost as well supported as English already was in Python 2. While the English inspired keywords and the English naming in the standard library and on the Python Package Index mean that Python's "native" language and the preferred language for global collaboration will always be English, the new design allows a lot more flexibility when working with data in other languages.

Consider processing a data table where the headings are names of Japanese individuals, and we'd like to use collections.namedtuple to process each row. Python 2 simply can't handle this task:

>>> from collections import namedtuple
>>> People = namedtuple("People", u"陽斗 慶子 七海")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/", line 310, in namedtuple
    field_names = map(str, field_names)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Users need to either restrict themselves to dictionary style lookups rather than attribute access, or else used romanised versions of their names (Haruto, Keiko, Nanami for the example). However, the case of "Haruto" is an interesting one, as there at least 3 different ways of writing that as Kanji (陽斗, 陽翔, 大翔), but they are all romanised as the same string (Haruto). If you try to use romaaji to handle a data set that contains more than one variant of that name, you're going to get spurious collisions.

Python 3 takes a very different perspective on this problem. It says it should just work, and it makes sure it does:

>>> from collections import namedtuple
>>> People = namedtuple("People", u"陽斗 慶子 七海")
>>> d = People(1, 2, 3)
>>> d.陽斗
>>> d.慶子
>>> d.七海

This change greatly expands the kinds of "data driven" use cases Python can support in areas where the ASCII based assumptions of Python 2 would cause serious problems.

Python 3 still needs to deal with improperly encoded data however, so it provides a mechanism for arbitrary binary data to be "smuggled" through text strings in the Unicode Private Use Area. This feature was added by PEP 383 and is managed through the surrogateescape error handler, which is used by default on most operating system interfaces. This recreates the old Python 2 behaviour of passing improperly encoded data through unchanged when dealing solely with local operating system interfaces, but complaining when such improperly encoded data is injected into another interface. The codec error handling system provides several tools to deal with these files, and we're looking at adding a few more relevant convenience functions for Python 3.5.

The underlying Unicode changes in Python 3 also made PEP 393 possible, which changed the way the CPython interpreter stores text internally. In Python 2, even pure ASCII strings would consume four bytes per code point on Linux systems. Using the "narrow build" option (as the Python 2 Windows builds from do) reduced that the only two bytes per code point when operating within the Basic Multilingual Plane, but at the cost of potentially producing wrong answers when asked to operate on code points outside the Basic Multilingual Plane. By contrast, starting with Python 3.3, CPython now stores text internally using the smallest fixed width data unit possible. That is, latin-1 text uses 8 bits per code point, UCS-2 (Basic Multilingual Plane) text uses 16-bits per code point, and only text containing code points outside the Basic Multilingual Plane will expand to needing the full 32 bits per code point. This can not only significantly reduce the amount of memory needed for multilingual applications, but may also increase their speed as well (as reducing memory usage also reduces the time spent copying data around).

Are we there yet?

In a word, no. Not for Python 3.4, and not for the computing industry at large. We're much closer than we ever have been before, though. Most POSIX systems now default to UTF-8 as their default encoding, and many systems offer a C.UTF-8 locale as an alternative to the traditional ASCII based C locale. When dealing solely with properly encoded data and metadata, and properly configured systems, Python 3 should "just work", even when exchanging data between different platforms.

For Python 3, the remaining challenges fall into a few areas:

  • helping existing Python 2 users adopt the optional multilingual features that will prepare them for eventual migration to Python 3 (as well as reassuring those users that don't wish to migrate that Python 2 is still fully supported, and will remain so for at least the next several years, and potentially longer for customers of commercial redistributors)
  • adding back some features for working entirely in the binary domain that were removed in the original Python 3 transition due to an initial assessment that they were operations that only made sense on text data (PEP 361 summary: bytes.__mod__ is coming back in Python 3.5 as a valid binary domain operation, bytes.format stays gone as an operation that only makes sense when working with actual text data)
  • better handling of improperly decoded data, including poor encoding recommendations from the operating system (for example, Python 3.5 will be more sceptical when the operating system tells it the preferred encoding is ASCII and will enable the surrogateescape error handler on sys.stdout when it occurs)
  • eliminating most remaining usage of the legacy code page and locale encoding systems in the CPython interpreter (this most notably affects the Windows console interface and argument decoding on POSIX. While these aren't easy problems to solve, it will still hopefully be possible to address them for Python 3.5)

More broadly, each major platform has its own significant challenges to address:

  • for POSIX systems, there are still a lot of systems that don't use UTF-8 as the preferred encoding and the assumption of ASCII as the preferred encoding in the default C locale is positively archaic. There is also still a lot of POSIX software that still believes in the "text is just encoded bytes" assumption, and will happily produce mojibake that makes no sense to other applications or systems.
  • for Windows, keeping the old 8-bit APIs around was deemed necessary for backwards compatibility, but this also means that there is still a lot of Windows software that simply doesn't handle multilingual computing correctly.
  • for both Windows and the JVM, a fair amount of nominally multilingual software actually only works correctly with data in the basic multilingual plane. This is a smaller problem than not supporting multilingual computing at all, but was quite a noticeable problem in Python 2's own Windows support.

Mac OS X is the platform most tightly controlled by any one entity (Apple), and they're actually in the best position out of all of the current major platforms when it comes to handling multilingual computing correctly. They've been one of the major drivers of Unicode since the beginning (two of the authors of the initial Unicode proposal were Apple engineers), and were able to force the necessary configuration changes on all their systems, rather than having to work with an extensive network of OEM partners (Windows, commercial Linux vendors) or relatively loose collaborations of individuals and organisations (community Linux distributions).

Modern mobile platforms are generally in a better position than desktop operating systems, mostly by virtue of being newer, and hence defined after Unicode was better understood. However, the UTF-8 vs UTF-16-LE distinction for text handling exists even there, thanks to the Java inspired Dalvik VM in Android (plus the cloud-backed nature of modern smartphones means you're even more likely to be encounter files from multiple machines when working on a mobile device).

Why Python 4.0 won't be like Python 3.0

Newcomers to python-ideas occasionally make reference to the idea of "Python 4000" when proposing backwards incompatible changes that don't offer a clear migration path from currently legal Python 3 code. After all, we allowed that kind of change for Python 3.0, so why wouldn't we allow it for Python 4.0?

I've heard that question enough times now (including the more concerned phrasing "You made a big backwards compatibility break once, how do I know you won't do it again?"), that I figured I'd record my answer here, so I'd be able to refer people back to it in the future.

What are the current expectations for Python 4.0?

My current expectation is that Python 4.0 will merely be "the release that comes after Python 3.9". That's it. No profound changes to the language, no major backwards compatibility breaks - going from Python 3.9 to 4.0 should be as uneventful as going from Python 3.3 to 3.4 (or from 2.6 to 2.7). I even expect the stable Application Binary Interface (as first defined in PEP 384) to be preserved across the boundary.

At the current rate of language feature releases (roughly every 18 months), that means we would likely see Python 4.0 some time in 2023, rather than seeing Python 3.10.

So how will Python continue to evolve?

First and foremost, nothing has changed about the Python Enhancement Proposal process - backwards compatible changes are still proposed all the time, with new modules (like asyncio) and language features (like yield from) being added to enhance the capabilities available to Python applications. As time goes by, Python 3 will continue to pull further ahead of Python 2 in terms of the capabilities it offers by default, even if Python 2 users have access to equivalent capabilities through third party modules or backports from Python 3.

Competing interpreter implementations and extensions will also continue to explore different ways of enhancing Python, including PyPy's exploration of JIT-compiler generation and software transactional memory, and the scientific and data analysis community's exploration of array oriented programming that takes full advantage of the vectorisation capabilities offered by modern CPUs and GPUs. Integration with other virtual machine runtimes (like the JVM and CLR) is also expected to improve with time, especially as the inroads Python is making in the education sector are likely to make it ever more popular as an embedded scripting language in larger applications running in those environments.

For backwards incompatible changes, PEP 387 provides a reasonable overview of the approach that was used for years in the Python 2 series, and still applies today: if a feature is identified as being excessively problematic, then it may be deprecated and eventually removed.

However, a number of other changes have been made to the development and release process that make it less likely that such deprecations will be needed within the Python 3 series:

  • the greater emphasis on the Python Package Index, as indicated by the collaboration between the CPython core development team and the Python Packaging Authority, as well as the bundling of the pip installer with Python 3.4+, reduces the pressure to add modules to the standard library before they're sufficiently stable to accommodate the relatively slow language update cycle
  • the "provisional API" concept (introduced in PEP 411) makes it possible to apply a "settling in" period to libraries and APIs that are judged likely to benefit from broader feedback before offering the standard backwards compatibility guarantees
  • a lot of accumulated legacy behaviour really was cleared out in the Python 3 transition, and the requirements for new additions to Python and the standard library are much stricter now than they were in the Python 1.x and Python 2.x days
  • the widespread development of "single source" Python 2/3 libraries and frameworks strongly encourages the use of "documented deprecation" in Python 3, even when features are replaced with newer, preferred, alternatives. In these cases, a deprecation notice is placed in the documentation, suggesting the approach that is preferred for new code, but no programmatic deprecation warning is added. This allows existing code, including code supporting both Python 2 and Python 3, to be left unchanged (at the expense of new users potentially having slightly more to learn when tasked with maintaining existing code bases).

From (mostly) English to all written languages

It's also worth noting that Python 3 wasn't expected to be as disruptive as it turned out to be. Of all the backwards incompatible changes in Python 3, many of the serious barriers to migration can be laid at the feet of one little bullet point in PEP 3100:

  • Make all strings be Unicode, and have a separate bytes() type. The new string type will be called 'str'.

PEP 3100 was the home for Python 3 changes that were considered sufficiently non-controversial that no separate PEP was considered necessary. The reason this particular change was considered non-controversial was because our experience with Python 2 had shown that the authors of web and GUI frameworks were right: dealing sensibly with Unicode as an application developer means ensuring all text data is converted from binary as close to the system boundary as possible, manipulated as text, and then converted back to binary for output purposes.

Unfortunately, Python 2 doesn't encourage developers to write programs that way - it blurs the boundaries between binary data and text extensively, and makes it difficult for developers to keep the two separate in their heads, let alone in their code. So web and GUI framework authors have to tell their Python 2 users "always use Unicode text. If you don't, you may suffer from obscure and hard to track down bugs when dealing with Unicode input".

Python 3 is different: it imposes a much greater separation between the "binary domain" and the "text domain", making it easier to write normal application code, while making it a bit harder to write code that works with system boundaries where the distinction between binary and text data can be substantially less clear. I've written in more detail elsewhere regarding what actually changed in the text model between Python 2 and Python 3.

This revolution in Python's Unicode support is taking place against a larger background migration of computational text manipulation from the English-only ASCII (officially defined in 1963), through the complexity of the "binary data + encoding declaration" model (including the C/POSIX locale and Windows code page systems introduced in the late 1980's) and the initial 16-bit only version of the Unicode standard (released in 1991) to the relatively comprehensive modern Unicode code point system (first defined in 1996, with new major updates released every few years).

Why mention this point? Because this switch to "Unicode by default" is the most disruptive of the backwards incompatible changes in Python 3 and unlike the others (which were more language specific), it is one small part of a much larger industry wide change in how text data is represented and manipulated. With the language specific issues cleared out by the Python 3 transition, a much higher barrier to entry for new language features compared to the early days of Python and no other industry wide migrations on the scale of switching from "binary data with an encoding" to Unicode for text modelling currently in progress, I can't see any kind of change coming up that would require a Python 3 style backwards compatibility break and parallel support period. Instead, I expect we'll be able to accommodate any future language evolution within the normal change management processes, and any proposal that can't be handled that way will just get rejected as imposing an unacceptably high cost on the community and the core development team.

Some Suggestions for Teaching Python

I recently had the chance to attend a Software Carpentry bootcamp at the University of Queensland (as a teaching assistant), as well as seeing a presentation from one of UQ's tutors at PyCon Australia 2014.

While many of the issues they encountered were inherent in the complexity of teaching programming, a few seemed like things that could be avoided.

Getting floating point results from integer division

In Python 2, integer division copies C in truncating the answer by default:

    $ python -c "print(3/4)"

Promoting to floating point requires type coercion, a command line flag or a future import:

    $ python -c "print(float(3)/4)"
    $ python -Qnew -c "print(3/4)"
    $ python -c "from __future__ import division; print(3/4)"

Python 3 just does the right thing by default, so one way to avoid the problem entirely is to teach Python 3 instead of Python 2:

    $ python3 -c "print(3/4)"

(In both Python 2 and 3, the // floor division operator explicitly requests truncating division when it is desired)

Common Python 2/3 syntax for printing values

I've been using Python 2 and 3 in parallel for more than 8 years now (while Python 3.0 was released in 2008, the project started in earnest a couple of years earlier than that, while Python 2.5 was still in development).

One essential trick I have learned in order to make regularly switching back and forth feasible is to limit myself to the common print syntax that works the same in both versions: passing a single argument surrounded by parentheses.

$ python -c 'print("Hello world!")'
Hello world!
$ python3 -c 'print("Hello world!")'
Hello world!

If I need to pass multiple arguments, I'll use string formatting, rather than the implicit concatenation feature.

$ python -c 'print("{} {}{}".format("Hello", "world", "!"))'
Hello world!
$ python3 -c 'print("{} {}{}".format("Hello", "world", "!"))'
Hello world!

Rather than doing this, the Software Carpentry material that was used at the bootcamp I attended used the legacy Python 2 only print syntax extensively, causing examples that otherwise would have worked fine in either version to fail for students that happened to be running Python 3. Adopting the shared syntax for printing values could be enough to make the course largely version independent.

Distinguishing between returning and printing values

One problem noted both at the bootcamp and by presenters at PyCon Australia was the challenge of teaching students the difference between printing and returning values. The problem is the "Print" part of the Read-Eval-Print-Loop provided by Python's interactive interpreter:

>>> def print_arg(x):
...     print(x)
>>> def return_arg(x):
...     return x
>>> print_arg(10)
>>> return_arg(10)

There's no obvious difference in output at the interactive prompt, especially for types like numbers where the results of str and repr are the same. Even when they're different, those differences may not be obvious to a student:

>>> print_arg("Hello world")
Hello world
>>> return_arg("Hello world")
'Hello world'

While I don't have a definitive answer for this one, an experiment that seems worth trying to me is to teach students how to replace sys.displayhook. In particular, I suggest demonstrating the following change, and seeing if it helps explain the difference between printing output for display to the user and returning values for further processing:

>>> def new_displayhook(obj):
...     if obj is not None:
...         print("-> {!r}".format(obj))
>>> import sys
>>> sys.displayhook = new_displayhook
>>> print_arg(10)
>>> return_arg(10)
-> 10

Understanding the difference between printing and returning is essential to learning to use functions effectively, and tweaking the display of results this way may help make the difference more obvious.

Addendum: IPython (including IPython Notebook)

The initial examples above focused on the standard CPython runtime, include the default interactive interpreter. The IPython interactive interpreter, including the IPython Notebook, has a couple of interesting differences in behaviour that are relevant to the above comments.

Firstly, it does display return values and printed values differently, prefacing results with an output reference number:

In [1]: print 10

In [2]: 10
Out[2]: 10

Secondly, it has an optional "autocall" feature that allows a user to tell IPython to automatically add the missing parentheses to a function call if the user leaves them out:

$ ipython3 --autocall=1 -c "print 10"
-> print(10)

This is a general purpose feature that allows users to make their IPython sessions behave more like languages that don't have first class functions (most notably, IPython's autocall feature closely resembles MATLAB's "command syntax" notation for calling functions).

It also has the side effect that users that use IPython, have autocall enabled, and don't use any of the more esoteric quirks of the Python 2 print statement (like stream redirection or suppressing the trailing newline) may not even notice that print became an ordinary builtin in Python 3.

On Wielding Power

Making the usually implied disclaimer completely explicit on this one: the views expressed in this article are my own, and do not necessarily reflect the position of any organisations of which I am a member.

Power is an interesting thing, and something that, as a society at large (rather than the specialists that spend a lot of time thinking about it), we really don't spend enough time giving serious consideration to. Trust and fear, hope and despair, interwoven with the complex dynamics of interpersonal relationships.

The most obvious kind of power is based on fear: people listening when you tell them what to do, based on a fear of the consequences if they ignore you. Many corporations have traditionally operated on this model: do what you're told, or you'll be fired. "You might lose your job" then hangs as an implicit threat behind every interaction with your management chain, and a complex web of legal obligations and social safety nets has arisen (to a greater or lesser degree in different countries) to help manage the effectiveness of this threat and redress the power imbalance. Fear based power is also, ultimately, the kind of power embodied in the legal system.

That's not the only kind of power though, and this post is largely about another form of it: power based on trust.

Power based on trust

Fear based power can be transferred fairly effectively: disobeying a delegate can be punished as severely as disobeying the original authority and so it goes. Interpersonal considerations don't get much consideration in such environments - they're about getting the job done, without any real concern for the feelings of the people doing it.

The efficiency of that kind of centralised control degrades fairly quickly though - with everyone being in constant fear of punishment, a whole lot of effort ends up being expended on figuring out what the orders are, communicating the orders, ensuring the orders have been followed, requesting new orders when the situation changes, recording exactly what was done to implement the orders and ensuring that if anything goes wrong it was the original orders that were to blame rather than the people following them and so on and so forth. It's like a human body that has no local reflexes, but instead has to think through the idea of removing its hand from a hotplate as an act of deliberate will.

There's a different kind of power though, summed up well in this YouTube video. What that kind of power is based on is the idea that once people have their core survival needs met, there are three key motivators that often work better than money: autonomy, mastery and purpose. (Note: this is after core survival needs are met. If people are still stressed about food, shelter, their health and their personal relationships, then autonomy, mastery and purpose can go take a hike)

At its best, an environment based on autonomy, mastery and purpose is one of mutual trust and respect. The purpose of the overall organisation and its individual components is sufficiently well articulated that everyone involved understands their responsibilities and how their efforts contribute to the greater whole, individuals are given a high degree of autonomy in determining how best to meet their obligations, and are supported in the pursuit of the required mastery to fulfil those obligations as well as possible.

This is the kind of distributed trust that Silicon Valley tries to sum up in its "move fast and break things" motto, but fails miserably in doing so. The reason? Those last two words there: "break things". It's an incredibly technocratic view of the world, and one that leaves out the most important element of any situation: the people.

This is a key point many technologists miss: ultimately, technology doesn't matter. It is not an end unto itself - it is only ever a means to an end, and that end will almost always be something related to people (we humans are an egocentric bunch). When you "break things" you hurt people, directly or indirectly. Now, maybe those things needed to be broken (and a lot of them do). Maybe those things were already hurting people, and the change just shifts (and hopefully lessens) the burden. But the specific phrasing in the Silicon Valley motto is one of cavalier irresponsibility, of freedom from consequences. "Don't think about the people that may be hurt by your actions - just move fast and break things, that's what we do here!".

This is NOT OK.

Yes, it needs to be OK to break things, whether deliberately or by mistake. Without that, "autonomy" becomes a myth, and we are left with stagnation. However, there's a difference between doing so carelessly, without accounting for the impact on those that may be harmed by the chosen course of action, and doing so while taking full responsibility for the harm your actions may have caused .

And with that, it's time to shift topics a bit. I assure you they're actually related, which may become clearer further down.

What is a corporation?

The glib answer here would be "a toxic cesspool of humanity", and I'll grant that's a fair description of a lot of them (see the earlier observations regarding fear based power). I am a capitalist though (albeit one that is strongly in favour of redistributive tax systems), so I see more potential for good in them than many other folks do.

So I'm going to give my perspective on the way some of the non-toxic ones work when running smoothly, at least in regards to three roles: the Chief Financial Officer, the Chief Technology Officer and the Chief Executive Officer. (You may choose not to believe me when I say non-toxic corporations are a real thing, but I assure you, such companies do exist, as most people don't actually like working for toxic cesspools of humanity. It's just that fully avoiding the descent into toxicity as organisations grow is, as yet, an unsolved problem in society. Radical transparency does seem to help a lot, though. Something about the cleansing power of sunlight and competing centres of power...).

The non-toxic CFO role is pretty straightforward: their job is to make sure that everyone gets paid, and the company not only survives, but thrives.

The non-toxic CTO role is also pretty straightforward: they're the ultimate authority on the technological foundations of an organisation. What's on the horizon that they need to be aware of? What's growing old and needs to be phased out in favour of something more recent? What just needs a bit of additional investment to bring it up to scratch?

The role of a CEO is a lot less clear. "Finance" is pretty clear, as is "Technology". But what does "Executive" mean? They're not just in charge of the executives - they're in charge of the whole company.

My take on it? The CEO is ultimately the "keeper of the company culture". They ultimately decide not only what gets done, but also how it gets done. While they have a lot of other important responsibilities, a key one in my mind is that it is the CEO's job to make sure that both the CFO and CTO remember to account for the people that will ultimately be tasked with handling "execution". They're the ones that say "no, we're not cutting that, it's important to the way we operate - we need to find another way to save money" (remember, we're only talking about the non-toxic corporations here).

So when employees of a corporation expect that company to do the right thing by them? They're trusting the CEO. Not the CFO. Not the CTO. The CEO. Arguably the key defining characteristic of a non-toxic corporation is that the CEO is worthy of that trust, as they will not only make those commitments to their employees, but also put the mechanisms in place to ensure the commitments are actually met. (This doesn't require any kind-hearted altruism on the CEO's part, by the way. You can get the same outcome through hard-nosed capitalism since making honest commitments to your staff and then keeping them is what "our people are our greatest asset" actually looks like in practice - it's just that a lot of organisations that say that don't actually mean it)

Commenting on other people's business

And that brings us to the specific reason I sat down to write this article: a tweet I posted earlier today regarding Mozilla's public debate over the board's choice of CEO. Specifically, I wrote:

I would take Eich accepting the Mozilla CEO role to mean his personal pride matters more to him than Mozilla's mission.

That's a pretty bold statement to make about someone I don't know and have never even met, and in relation to an organisation that I don't have any direct relationship with beyond being a user of their software and a fan of their mission.

Note the things I didn't suggest. I didn't suggest he resign from his existing position as CTO. I didn't suggest that the Mozilla board withdraw their offer of the CEO position. However, I did state that, from my perspective as an outsider that wants to see Mozilla execute on their mission to the best of their ability, "trust me" is a sufficiently big call to have to make for a role as critical as the CEO position that I don't believe Eich should be asking that of his fellow members of the Mozilla community. Actions have consequences, and one of those consequences can be "No, you no longer have the right to request our trust - you actively hurt us, and we don't believe you when you say you wish to make amends".

To Eich's credit, he at least didn't just say "trust me", but rather made a number of specific commitments. However, the time to build that credibility in a relatively open organisation is before accepting such a significant role, not after. Otherwise, there will always be a lingering doubt for affected individuals that any public statements are a matter of the responsibilities of the position, rather than a genuine change in personal convictions. When it comes to matters like a commitment to inclusiveness you don't want a CEO that is going through the motions out of a sense of obligation: this stuff is hard work and tempting to skimp on, even when you do care about it at a personal level (as a case in point - it would have been so much easier for me to not comment on this situation at all, that I almost left it at just a couple of vague allusions on Twitter rather than getting specific).

Separating the personal from the professional is always difficult, and few places moreso than the CEO role. During my tenure at Boeing, we had two CEOs asked to resign due to unprofessional conduct. The military industrial complex is a sordid mire of duplicitous misbehaviour and waste that makes the open source technology community look like saints by comparison (and I'll let you draw your own conclusions as to what it says about me personally that I survived there for more than a decade), yet even they were of the opinion that personal conduct matters at the CEO level, even moreso than in other less prominent roles.

For the record, I personally do hope Eich's newfound commitment to inclusiveness is genuine, and that the concerns raised regarding his appointment as CEO prove to be unfounded. I'd prefer to live in a world where the blog post linked above represents a genuine change of heart, rather than being merely a tactical consideration to appease particular groups.

Ultimately, though, my opinion on this topic doesn't matter - it's now up to Eich to demonstrate through his actions that he's worthy of the trust that the Mozilla board have placed in him, and for concerned members of the Mozilla community to decide whether they're willing to adopt a "wait and see" approach as suggested, or if they consider that belated request in and of itself to be an unacceptable breach of their trust.

Change the Future - one small slice of PyCon US 2013

I'm currently kicking back in Red Hat's Mountain View office (I normally work from the Brisbane office in Australia) after a lovely lunch with some of the local Red Hatters, unwinding a bit and reflecting on an absolutely amazing week at PyCon US 2013 just down the road in Santa Clara.

For me, it started last Wednesday with the Python Language Summit , an at-least-annual-sometimes-biannual get together of the developers of several major Python implementations, including CPython (the reference interpreter), PyPy, Jython and IronPython. Even with a full day, there were still a lot of interesting topics we didn't get to and will be thrashing out on the mailing lists as usual. However, good progress was made on a few of the more controversial items, and there are definitely exciting developments in store for Python 3.4 (due in early 2014, probably shortly after PyCon in Montreal if past history is anything to go by).

Thursday was a real eye-opener for me. While I did have to duck out at one point for a meeting with a couple of the other CPython developers, I spent most of it helping out at the second of the Young Coders tutorials run by Katie Cunningham and Barbara Shaurette. These tutorials were conducted using Raspberry Pi's with rented peripherals, and the kids attending received both the Pi they were using as well as a couple of introductory programming books.

Watching the class, and listening to Katie's and Barbara's feedback on what they need from us in the core substantially changed my perspective on what IDLE can (and, I think, should) become. Roger Serwy (the creator of IdleX, a version of IDLE with various improvements) has now been granted access to the CPython repo to streamline the process of fixing the reference implementation, and we're working on plans to make the behaviour of IDLE more consistent across all currently supported Python versions (including Python 2.7). (Some aspects of this, especially Roger's involvement, are similar to what happened years ago for Python 2.3 when Kurt B. Kaiser, the PSF's treasurer, shepherded the reintegration of the IDLEfork project and its major enhancements to IDLE back into the reference IDLE implementation in the Python standard library).

Friday saw the start of the conference proper, with inspirational keynotes from Jesse Noller (conference chair and PSF board member) on helping to change the future by changing the way we introduce the next generation to the computers that are now an ever-present aspect of our lives, and from Eben Upton (co-founder of the Raspberry Pi foundation), on how the Pi came to be the educational project it is today, and some thoughts on how it might evolve into the future.

Jesse's keynote included the announcement that every attendee (all 2500 of them) would be receiving a free Raspberry Pi, and that any Pi's that attendees didn't want to claim would be redistributed to various educational groups and programs. Not only that, but Jesse also announced, a new site for sharing Raspberry Pi based projects and resources, as well as a "Rasberry Pi Hack Lab" running for the duration of the conference, where attendees could hook their Pi's up to a keyboard and monitor, as well as experiment with various bits and pieces of electronics donated by one of the conference sponsors. Richard Jones also stepped up to run some additional short introductory PyGame tutorials in the lab (he had run a full 3 hour session on PyGame as part of the paid tutorials on the Wednesday and Thursday prior to the conference).

One key personal theme for the conference revolved around the fact that I've volunteered to be Guido's delegate in making the final decisions on how we reshape Python's packaging ecosystem in the lead up to the Python 3.4 release. I'll be writing quite a bit more on that topic over the coming weeks, so here I'll just note that it started with proposing some changes to the Python Enhancement Proposal process at the language summit on the Wednesday, continued through the announcement of the coming setuptools/distribute merger on Thursday, the "packaging and distribution" mini-summit I organised for developers on the Friday night, the "Directions in Packaging" Q&A panel we conducted on the Saturday afternoon, some wonderful discussions with Simeon Franklin on his blog regarding the way the current packaging and distributions issues detract from Python's beginner friendliness and on into various interesting discussions, proposals and development at the sprints in the days following the conference.

Unfortunately, I didn't actually get to meet Simeon in person, even though I had flagged his poster as one I really wanted to go see during the poster session. Instead, I spent that time at the Red Hat booth in the PyCon Jobs Fair.  The Jobs Fair is a wonderful idea from the conference organisers that, along with the Expo Hall, recognises the multi-role nature of PyCon: as a community conference for sharing and learning (through the summits, scheduled talks, lightning talks, poster session, open spaces, paid tutorials, Young Coders sessions, Raspberry Pi hack lab, and sprints), as a way for sponsors to advertise their services to developers (through the Expo Hall and sponsor tutorials) and as a way for sponsors to recruit new developers (through the Jobs Fair). PyCon has long involved elements of all of these things (albeit perhaps not at the scale achieved this year), but having the separate Expo Hall and Jobs Fair helps keep sales and recruitment activity from bleeding into the community parts of the conference, while still giving sponsors a suitable opportunity to connect with the development community.

Both at the Jobs Fair and during the rest of the conference, I was explaining to anyone that was willing to listen what I see as Red Hat's role in bridging the vast gulf between open source software enthusiasts (professionals and amateurs alike) and people for whom software is merely a tool that either helps (hopefully) or hinders (unfortunately far too often) them in spending time on their actual job/project/hobby/etc.

I also spent a lot of time talking to people about my actual day job. I'm the development lead for one of the test systems at Red Hat, and while it is very good at what it does (full stack integration testing from hardware, through the OS and up into application software), it also needs to integrate well with other systems like autotest and OpenStack if we're going to avoid pointlessly reinventing a lot of very complicated wheels. Learning more about what those projects are currently capable of makes it easier for me to prioritize the things we work on, and make suitable choices about Beaker's overall architecture.

At the sprints, in addition to working on CPython and some packaging related questions, I also took the opportunity to catch up with the Mailman 3 developers - the open source world needs an email/web forum gateway that at least isn't actively awful, and the combination of Mailman 3 with the hyperkitty archiver is shaping up to be positively wonderful.

I didn't spend the entire conference weekend talking to people - I actually got to go see a few talks as well. All of the talks I attended were excellent, but some particular personal highlights were Mike Bayer's deep dive into SQL Alchemy's session behaviour, the panel on the Boston Python Workshop and a number of other BPW inspired education and outreach events, Mel Chua's whirlwind tour of educational psychology,  Lynn Root's educational projects for new coders (with accompanying website), Dave Malcolm's follow-up on his efforts with static analysis of all of the CPython extensions in Fedora, and Dave Beazley's ventures into automated home manufacturing of wooden toys (and destruction of laptop hard drives). There were plenty of other talks that looked interesting but I unfortunately didn't get to (one of the few downsides of having so many impromptu hallway conversations). All the PyCon US 2013 talks should be showing up on as the presenters give the thumbs up, and the presentation slides are also available, so it's worth trawling through the respective lists for the topics that interest you.

In the midst of all that, Van Lindberg (PSF chairman) revealed the first public draft of the redesigned (I was one of the members of the review committee that selected Project Evolution, RevSys and Divio as the drivers of this initial phase of the redesign process), and also announced the successful resolution of the PSF's trademark dispute in the EU.

This was only my second PyCon in North America (I've been to all three Australian PyCons, and attended PyCon India last year) and the first since I joined Red Hat. Meeting old friends from around the world, meeting other Pythonistas that I only knew by reputation or through Twitter and email, and meeting fellow Red Hatters that I had previously only met through IRC and email was a huge amount of fun. Attending the PyLadies charity auction, visiting the Computer History Museum with Guido van Rossum, Ned Deily and Dwayne Litzenberger (from Dropbox), chatting with Stephen Turnbull about promoting the adoption of open source and open source development practices in Japan, and getting to tour a small part of the Googleplex were just a few of the interesting bonus events from the week (and now I have a few days vacation to do the full tourist thing here in SFO).

I'm still on an adrenaline high, and there are at least a dozen different reasons why. If everything above isn't enough, there were a few other exciting developments happening behind the scenes that I can't go into yet. Fortunately, the details of those should become public over the next few weeks so I won't need to contain myself too long.

This week was intense, but awesome. All the organisers, volunteers and sponsors that played a part in bringing it together should be proud :)