Switching to Android

So my new HTC Desire S arrived today which means it is time for an app downloading spree... (starting with just freebies for now while I figure out what I do and don't like, and delete some of the cruft that HTC/Telstra added)

Already grabbed:
Google Goggles
Google Sky
Google Reader
Dropbox
Firefox
KeepassDroid
OI File Manager
Shelves for Android (now actually *scanning* my book collection will be quite a project...)
Barcode Scanner
Compass

And a ton of standard apps from Google/HTC/Telstra for all the basics (Phone, SMS, Music, Mail/Gmail, FB, Twitter, Camera, Calendar, Clock, Weather, Calculator, Adobe PDF Reader, Maps/Navigation, LED Flashlight, etc).

Things I know I want but don't have yet:

Ebook reader (I used Stanza from Lexcycle on the iPhone, but they don't make an Android version)
Weight tracker (don't need anything fancy, just something that I can import old data into and will give me a time-weighted average)

I'm also open to suggestions for things I might want but just don't know it yet, and of course I'll have to track down a few idle time games.

Fixing GRUB2 update issues with Kubuntu 11.04

After doing a dist-upgrade to 11.04 a while back, my Kubuntu machine refused to boot.

I eventually tracked this down to GRUB2 os-prober feature freaking out and trying to boot off the partition that held only the "/usr" directory rather than the one with the root and "/boot" directories. (Why, I have no idea. The latter is the first partition and the one that holds the MBR, so os-prober is clearly going to some effort to find and enforce the wrong partition).

After searching on Google and with a bit of experimentation, I was able to fix it by booting off a LiveCD, adding the line "GRUB_DISABLE_OS_PROBER=true" to "/etc/default/grub" and then running "sudo update-grub".

Musings on the culture of python-dev

I mentioned at the end of my PyCon summary post that several people had told me that they find python-dev to be a hostile and unwelcoming environment, and that (after some reflection on the matter) I could actually see their point. We may be the model of civility compared to somewhere like the Linux Kernel Mailing List or (*shudder*) Blizzard's World of Warcraft forums, but mere civility is a far cry from being a consciously welcoming place.

I'll say up front that python-dev itself is unlikely to change any time soon. There are reasons it is the way it is, and I'll elaborate on some of them later in this post. In the meantime, python-ideas is available as a venue where outlandish ideas won't be rejected quite as abruptly, and the core-mentorship list has been set up specifically to provide a gentler introduction to the ways and means of core development without having to jump right in at the deep end by posting to python-ideas or python-dev. Questions about development with Python will continue to be redirected in fairly short order to python-list or python-tutor.

And, of course, what follows is just my opinion. Ask another veteran python-dev poster what they think, and you'll likely get a different answer. Ask an actual newcomer to (or lurker on) python-dev, and they'll probably have a different answer, too.

Python evolves too slowly!

You're changing the language too fast!

If I had to choose just one explanation for the frequency of abrupt responses in python-dev, the tension between the above two statements would be it. Compared to application-level open source projects, Python actually evolves quite slowly. 18+ months between minor version increments? 10 years between major versions? That's crazy! Canonical releases an entire new OS version every 6 months!

On the flip side, however, for a programming language definition, Python evolves quite fast. There's no such thing as a minor release for C or C++ - the last versions of those were C99 and C++98 respectively. We should see C++11 published by the end of the year, and C1X is still in development. CPython's own PEP 7 still mandates the use of C89 compatible constructs for portability reasons, even though C89 is older than one of our release managers.

Java hasn't had a major feature update since 2006, and even C# is only running at a new version every 2-3 years (with the last formally standardised version being 2.0 back in 2006).

Python also has a history of being more aggressive with deprecations than are languages backed by large corporate sponsors. We only have limited volunteer resources, so rather than letting old code bitrot (or else take up maintenance time when it breaks), we'd prefer to rip it out in favour of an improved alternative. However, "more aggressive" is still pretty slow - some deprecated features stuck around for nearly 10 years (until Python 3 came out) and even a "fast" deprecation has historically taken at least 3 years (x.y contains feature, x.y+1 deprecates it, x.y+2 removes it). It's highly likely that that deprecation period will be extended out by a release for the 3.x series, pushing the minimum lifetime of a new feature that later proves to be a mistake out to nearly 5 years.

The task of updating the language and the standard library is a balancing act between those two forces - we want to make life easier for programmers adopting Python for new activities, while preserving backwards compatibility for existing applications. This is why Python 3 is such a big deal - most decisions are made with the emphasis on the needs of existing Python programmers, but the decision to create Python 3 was largely for the benefit of future Python programmers. That means that all current Python programmers are lumped with the task of actually managing a disruptive transition that wasn't really designed for their immediate benefit. Obviously, the collective opinion of python-dev is that it will be worth the pain in the long run, but those anticipated benefits don't make the migration any easier to deal with right now.

We get quite a few people coming into python-dev and betraying quite quickly that they don't have any respect for the time frames involved in language (rather than application) development. Telling someone "You're wrong, but explaining the real reasons why would require that I distil decades of experience down into a single email post and I don't feel like taking the time to do that right now, since even if I tried you would ignore me anyway" tends to be difficult to phrase politely.

A lot of the rest of this post is really just elaborations on the theme of why a programming language needs to evolve more slowly than most other pieces of software.

Heart of the ecosystem, but far from the whole of it

I've elaborated on the cost of change before, when discussing the design principle "Status Quo Wins a Stalemate". The core point is that any significant change made by python-dev, even one that will ultimately be a positive one, imposes a high near-term cost as the implications of the change ripple out across the whole Python ecosystem. As noted above, newcomers can easily perceive this as high-and-mighty arrogance rather than the voice of experience.

What do you mean by "cognitive burden"?

Even without considering the near-term cost of changes, every addition to the language (and even the standard library) imposes a potential burden on anyone learning the language in the future. You can't just say, "Oh, I won't worry about learning that feature" if the code base you've been asked to maintain uses it, or if it is offered as an answer to a query posted on python-list or Stack Overflow or the like. The principle of "There Should Be One - and preferably only one - Obvious Way To Do It" is aimed squarely at reducing the cognitive load on people trying to learn the language. Quite clearly, the "only one" aspect is an ideal rather than a practical reality (two kinds of string formatting and three argument parsing libraries in the standard library all say "Hi!"), but in such cases we do try to indicate that the most recently added (and hopefully least quirky) approach is the preferred way to do it.

This idea is also encountered as the aphorism "Not every three line function needs to be a builtin". Again, new posters may not take kindly to being told that their idea simply doesn't cut it as a potential language addition.

Gee, how dumb are you lot? Why don't you just...?

Another favourite bugbear is posters that bounce into python-dev assuming that we're a collection of clueless idiots that can't see the obvious. Collectively, we do pay quite a bit of attention to what other language communities are doing, as well as having personal experience with what does and doesn't work in actual programming practice. There's a reason that "new" Python features are generally modelled on something that has been demonstrated to work elsewhere (e.g. list comprehensions et al inspired by Haskell, the with statement partially inspired by C++ RAII, the new string formatting syntax inspired by C# string formatting).

New posters that give the list a tiny bit of credit and do us the courtesy of at least asking "Has this been thought of or discussed before? Are there any problems with it that I haven't considered?" tend to get significantly more positive reactions than those that start with a tone closer to "Here is my awesome idea, and you are seriously dumb if you don't get it and decide to adopt it immediately!". Positive responses are even more likely if ideas are posted to the right list (i.e. python-ideas).

You do remember you didn't have to pay a cent for this, right?

A fortunately rare (but still annoying when it arises) source of negative reactions is the end user that comes in demanding to know why certain things aren't being done, when the answer is "Because nobody stepped up to either do it themselves, or to pay for someone else to do it". It's a pretty simple equation, really, and not demonstrating understanding of it suggests a complete disregard for the volunteer nature of so many of the contributions that have been made to Python over the years.

In the end, we're still just people

We like painting bikesheds (or, more to the point, we can't always help ourselves, even when we know better). We like to be right and "win" arguments (or sometimes simply take time to process and properly understand the point someone else is trying to make). Even the mailing list members that are paid to work with Python by our employers are typically still participating in python-dev and hacking on CPython in our spare time rather than as a job, so there's not a lot of tolerance for "noise" and "time wasting".

As I'm a firm believer in the phrase "Vigorous criticism is the only known antidote to error", there are limits to how much I would personally want the culture of python-dev to change. Moving "blue sky" dreaming to python-ideas, "how does the process work?" coaching to core-mentorship and VCS management issues to python-committers allow them to develop cultures more appropriate to those specific activities, allowing python-dev to really focus in on the "vigorous criticism" part of the story. Keeping that from crossing the line into excessive negativity and a total reluctance to change is an ongoing challenge, but hopefully an awareness of that danger and the occasional pause for reflection will be enough to keep things on the right track.

The benefits (and limitations) of PYC-only Python distribution

This Stack Overflow question hit my feed reader recently, prompting the usual discussion about the effectiveness of PYC only distribution as a mechanism for obfuscating Python code.

PYC Only Distribution

In case it isn't completely obvious from the name, PYC only distribution is a matter of taking your code base, running "compileall" (or an equivalent utility) over it to generate the .pyc files, and then removing all of the original .py source files from the distributed version.

Plenty of Python programmers (especially the pure open source ones) consider this practice an absolute travesty and would be quite happy to see it disallowed entirely. Early drafts of PEP 3147 (PYC Repository Directories) in fact proposed exactly that - in the absence of the associated source file, a compiled PYC file would have been ignored.

However, such blatant backwards incompatibility aroused protests from several parties (including me), and support for PYC-only distribution was restored in later versions of the PEP (although "compileall" now requires a command line switch in order to generate the files in the correct location for PYC-only distribution).

Use Cases

As I see it, there are a couple of legitimate use cases for PYC-only distribution:
  • Embedded firmware: If your code is going onto an embedded system where space is at a premium, there's no point including both your source code and the PYC files. Better to just include the compiled ones, as that is all you really need
  • Cutting down on support calls (or at least making the ones you do get more comprehensible): Engineers and scientists like to tinker. It's in their nature. When they know just enough Python to be a danger to themselves and others, you can get some truly bizarre tickets if they've been fiddling with things and failed to revert their changes correctly (or didn't revert them at all). Shipping only the PYC files can help make sure the temptation to fiddle never even arises

Of the two, the former is by far the stronger use case. The latter is attempting a technical solution to a social problem and those rarely work out well in the long run. Still, however arguable its merits, I personally consider deterrence of casual modifications a valid use case for the feature.

Drawbacks

Stripping the source code out of the distribution does involve some pretty serious drawbacks. The main one is the fact that you no longer have the ability to fall back to re-compilation if the embedded magic cookie doesn't match the execution environment.

This restricts practical PYC-only distribution to comparatively constrained environments that can ensure a matching version of Python is available to execute the PYC files, such as:
  • Embedded systems
  • Corporate SOEs (Standard Operating Environments)
  • Bundled interpreters targeting a specific platform

Cross-platform compatibility of PYC files (especially for 32-bit vs 64-bit and ARM vs x86) is also significantly less robust than the cross-platform compatibility of Python source code.

Limitations

Going back to the SO question that most recently got me thinking about this topic, the big limitation to keep in mind is this: shipping only PYC files will not reliably keep anyone from reading your code. While comments do get thrown away by the compilation process, and docstrings can be stripped with the "-OO" option, Python will always know the names of all the variables at runtime, so that information will always be present in the compiled bytecode. Given both the code structure and the original variable names, most decent programmers are going to be able to understand what the code was doing, even if they don't have access to the comments and docstrings.

While there aren't any currently active open source projects that provide full decompilation of CPython bytecode, such projects have existed in the past and could easily exist again in the future. There are also companies which provide Python decompilation as a paid service (decompyle and depython are the two that I am personally aware of).

Alternatives

You can deter casual tinkering reasonably well by placing your code in a zip archive with a non-standard extension (even .py!). If you prepend an appropriate shebang line, you can even mark it as executable on POSIX based systems (see this post for more information).

You could also write your code in Cython or RPython instead of vanilla Python and ship fully compiled executable binaries.

There are minifier projects for Python (such as mnfy) that could be fairly readily adapted to perform obfuscation tricks (such as replacing meaningful variable names with uninformative terms like "_id1").

Climate change "skepticism": a text book case of the "Tragedy of the Commons"

So, with the government moving to implement a carbon tax, the issue of climate change and attempts to mitigate it are once again a big deal here in Australia.

Inspired by a friend's comment comparing carbon taxes with fines for littering, along with a few of the responses to that comment, I started pondering the similarities and differences between the current attempts to deal with carbon dioxide emissions, and the effective cost internalisation schemes created in the 1990's that severely reduced industrial sulphur dioxide emissions, as well as various current laws that prohibit dumping of toxic waste in developed nations (and their unintended side effects).

(Fair warning before I wade in: I've simplified quite a few things, particularly on the acid rain front. This post is long enough as it is, without making it even longer with all the relevant caveats about other causes of acid rain and any other details I've glossed over as being irrelevant to the main point of the article. It's a general overview, not a peer reviewed scientific paper)

A tale of two gases
The key similarity between sulphur dioxide induced acid rain and carbon dioxide induced global warming is hopefully fairly obvious: they both represent a classic economic "externality", a cost borne by someone other than the person responsible for causing it. The industries and individuals emitting these pollutants don't suffer any immediate harmful consequence through the normal action of nature.

In situations like that, regulation in one form or another is the only effective tool we have to ensure that at least some portion of those external costs is reflected back on the responsible individuals. As noted in the link above, this approach proved extraordinarily effective in reducing acid rain in the US, drastically cutting sulphur emissions at a fraction of the originally predicted cost.

However, there are a few key differences that have contributed to the discussion over carbon dioxide taking a rather different path to that over sulphur dioxide:

  1. It's hard to deny the harmful effects of sulphur dioxide emissions when plants, fish and insects are dying due to the excessive levels of acidity in rain and ground water. By contrast, it is very hard to convey clearly why a small rise in average global temperatures is going to be such a bad thing, or even how human carbon dioxide emissions contribute to that happening.
  2. Acid rain from sulphur emissions is generally a local problem with local causes (from a continental perspective, anyway). Stop emitting sulphur dioxide in the US and acid rain stops falling in the US. Significantly reduce carbon dioxide emissions in Australia (or even the US) though, and we're likely still screwed if other countries don't follow suit.
  3. The "sulphur" cycle (how long it takes for the sulphur emissions to be deposited back on the ground as acid rain) is significantly shorter than that of carbon, so efforts to reduce emissions will have an effect on water acidity levels in a reasonably short time frame
Comparisons with anti-waste dumping laws are similar: the effects of toxic waste dumping are generally both local, obvious and able to be cleaned up within years rather than decades, so it is easy to get support for laws prohibiting such practices, even if the end result of those laws is just to export the problem to a poorer country.

Manufacturing doubt and exploiting the "Grey Fallacy"
Acid rain is an easy problem to sell: the acid damages plants and objects directly, so the harm can be easily conveyed through pictures and videos. The problem of climate change, though is far, far harder to illustrate simply, since it is a signal buried in some very noisy weather patterns.

A fundamental issue is the fact that people don't really experience climate, but instead experience weather. The day to day and seasonal variations in the weather massively exceed the scale of the underlying trends being discussed by climate scientists. The fact that predicting the weather in a few days time is harder than predicting long term climate trends is actually massively counterintuitive, even though almost all scientists and engineers are familiar with the fact that random fluctuations at a small scale may average out to something almost perfectly predictable at a large scale (compare the unpredictable nature of quantum physics with the straightforward determinism of classical Newtonian mechanics).

We're also not really used to the idea of directly affecting the balance of the planet on a global scale. In absolute terms, the effects of the sun, the oceans and the plants on the carbon cycle all dwarf the human contribution. The idea that the non-human forcings were actually largely in balance with each other, and that the comparatively small human contribution is enough to tip that balance to the point of producing more carbon than can be absorbed by natural mechanisms is really quite a subtle one.

The scientific background needed to truly understand and come to grips with the hows and the whys of the IPCC predictions reminds me of this comic regarding the knowledge needed to even begin to understand the vagaries of string theory (make sure to click through on the comic itself to see the whole chain of images).

If there wasn't anyone around with a vested interest in maintaining the status quo (at least for a while longer), this likely wouldn't be a problem. Scientists, politicians, economists and engineers could proceed with the development of mitigation strategies, while also attempting to educate the general populace as to the reasons behind any actions taken. Since such vested interests do exist, however, their opposition makes it significantly harder to convince the lay public that there is a real problem here.

Because climate science is such a complex topic, it is actually quite hard to explain to a non-scientist just why the scientific consensus is as strong as it is. If you oversimplify, you veer into territory where statements are so oversimplified that they border on being false. However, if you're coming from the other angle and want to persuade people that the science "isn't settled", then you're no longer constrained by the need to be accurate and can just go with whatever sounds more intuitively plausible and/or better caters to people's natural inclinations and prejudices.

Sites like Skeptical Science do their best to clearly explain why the oft-repeated "skeptic" arguments are basically BS (and do it well), but to someone only following the Cliff Notes mainstream media version of the debate, the temptation is still very, very strong to just assume the truth lies somewhere between the two positions being debated.

Telling people to "do their own research" doesn't really help all that much in practice. Telling the BS from the valid science is itself a fine art that takes a great deal of skill and experience. Being a veteran of arguments with creationists (and even intelligent design advocates) is actually quite beneficial, since the global warming "skeptics" use many of the same rhetorical tricks as creationists do when attempting to deny the fact of evolution (incessant use of such deceptive tactics is actually one of the major hints that someone is trying to sell you a line rather than just stating the truth as they see it). The most common tactic used by both groups is to drag out a thoroughly rebutted argument for each new audience, in hopes that they aren't aware of the existence of a rebuttal.

For example, just as most people can't answer "How could a bombardier beetle possibly evolve?" off the top of their heads - I've actually forgotten all of the plausible answers myself - neither can they answer questions like "If the climate is supposed to be warming, why is it colder now than it was during the Middle Ages?". While that one is actually fairly straightforward to answer (ice cores and other data shows that the medieval warming was likely localised to Europe due to various currents in the Atlantic, but this merely shifted heat around, so that other parts of the world were correspondingly colder), there are dozens of other oft-repeated thoroughly debunked arguments, and it's basically impossible for a mere interested observer to remember them all. As it turns out, I had actually misremembered the correct explanation for the Medieval warm period, so the point above isn't quite right and invites an attack on the grounds that I don't know what I'm what I'm talking about. To some extent that's actually true - my opinion on climate science issues is based on a meta-analysis of the trustworthiness of various information sources (including the full IPCC reports), since I'm not inclined to spend a few decades studying up and redoing all the science myself. Fortunately, Skeptical Science has the full story for this question and many others (the MWP was actually colder than the latter half of this century, despite higher solar activity and lower volcanic activity), so correcting my error is the task of a few moments. And if you're still wondering about the bombardier beetle thing, TalkOrigins has the full story for that, too.

Fortunately, at the decision making level here in Australia, this part of the debate seems to be coming to a close, with even Toby Abbott (the leader of the opposition) at least paying lip service to the fact that human-induced increases in average global temperatures are now a fact of life. However, the mainstream media is still happy to "teach the controversy" in the hopes of picking up a few more eyeballs (or ears) to sell to their advertisers.

But the problem is so much bigger than us!
However, even once you get agreement that human-induced global warming due to excessive carbon dioxide emissions is a genuine problem, you then run into the "But we can't do anything about it!" attitude.

Many Australians (including some elected members of our Federal parliament) go "Oh, but the US/Europe/Brazil/Russia/India/China have a much bigger impact than we do. Doing anything will hurt our international competitiveness without really achieving anything, so we shouldn't bother."

Even the higher emission countries point fingers at each other. The US won't budge until China does. The BRIC countries are waiting for the US to make a move, and use US inaction as justification for similarly doing nothing in the meantime.

An absolutely textbook "Tragedy of the Commons" reaction. And we know how that story ends, don't we? In the long run everybody loses, nobody wins.

How do you fix it? Rules, regulations and social pressure. The "community of nations" isn't just words. The complex web of interdependencies that spans the world gives countries real power to exert influence on each other. Once some countries start to make a move towards serious carbon emission control strategies, then that becomes a bargaining chip to use in international negotiations. Will it work? Who knows. The only thing we know for sure is that the more carbon we pump into the atmosphere over the next few decades, the higher the impact from global warming will be (it's now guaranteed that there will be some impact - the carbon cycle is too long for us to prevent that at this late stage of the game).

Ideally you would want to adopt an emissions trading scheme along the lines of that used to curb sulfur emissions, but the wholehearted embrace of dodgy pseudoscience by far too many members of our Federal Opposition party spiked that particular barrel.

So a carbon tax it is (for now, anyway). The hysterical cries of "Oh my god, you've doomed the country!" ring loudly from those that will bear most of the direct costs of actually internalising the negative effects their carbon emissions have on the wider environment. Their motives are, to say the least, a little bit suspect. The political opportunism involved in our Federal Opposition leader backing them is disappointing, but unsurprising.

In the PM's words
There's a nice summary of the current situation in a Fairfax opinion piece that went out over Gillard's byline:
The longer we leave it the harder action on climate change gets. This reform road is a hard one to walk. Just as doing nothing is not an option, we need to be careful to ensure that we do not make decisions that will cost our economy and jobs.
There's a definite risk that the government's plans won't live up to their ambitions. However, the world's past experience with the sulfur dioxide doomsayers should arouse some deep skepticism and not directed towards those wanting to press forward with plans to curb carbon emissions.

Queensland's "unelected" leader of the opposition

Following the declaration of Brisbane's Lord Mayor that he is running for state parliament at the next election, and will be the leader of the opposition for that campaign, there is a trope making the rounds that he is somehow "unelected".

Out of curiosity, I decided to see how his numbers in the Brisbane City Council elections in 2008 stacked up against Anna Bligh's numbers in the 2009 state election.

The internet being what it is, official sources for these numbers weren't too hard to dig up:
I found the Lord Mayoral results on the Brisbane City Council site
I found the Electorate of South Brisbane results on the Electoral Commission Qld site (with the preferences taken into account here).

Results:
Campbell Newman received 335,076 first preference votes for Lord Mayor (60% of the total).
After distribution of preferences, he received a total of 339,320 votes (66% of the total).
It's probably worth noting that the second placed candidate received 29% of the first round votes, increasing to 34% after distribution of preferences, so Newman actually did receive the majority of the remaining preferences, there just weren't many to go around.

Anna Bligh received 12,243 first preference votes in her electorate of South Brisbane (48% of the total).
After distribution of preferences, she received a total of 14,697 votes (65% of the total).

So, our State Premier holds her position on the back of the support of her party and around 15k people living in or near South Brisbane.

The new leader of the opposition will hold that position on the back of the support of his party and around 340k people living in the City of Brisbane.

Since the current population of Qld is estimated at 4.5 million people, effectively 7.5% of the state voted for Newman to be Brisbane's Lord Mayor, while only 0.3% voted for Bligh to represent the seat of South Brisbane.

And people are calling Newman an unelected leader?

(Yes, I'm aware that the nature of parliamentary governments based on geographic representation means that most constituents don't get to vote directly for their parliamentary leaders. The only purpose of this post is to point out how dumb that makes the "unelected" gibe sound when it is aimed at the directly elected Lord Mayor of a city the size of Brisbane)

Thoughts and Impressions following PyCon 2011

I'm back home following my inaugural trip to PyCon, so it seems like a good time to record my impressions of the summits, conference and sprints.

I really enjoyed the whole experience - kudos to Van Lindbergh, Jesse Noller and the rest of the volunteer team for organising everything. I'm glad to see the "apprenticeship" setup for the conference leadership continuing, with Jesse (the deputy coordinator this year) stepping up to coordinate Santa Clara, with the future coordinator for Montreal assisting during those two years.

The personal connection side of things was brilliant. When it comes to the folks that were already on python-dev before I really started to follow it back in late 2003, I've been interacting and working with them online for 8+ years, and there are of course many others that have joined python-dev in the intervening time. I'd seen a few of them in photos and videos, and heard a few in videos and podcasts, but by and large, this was the first time I had been able to connect faces and voices to names on the screen. Very cool stuff, including getting to meet Raymond Hettinger (who accepted my first patches back in '04, among them the looks-insane-but-it-works string hack to speed up the 2.x decimal module) and of course Guido himself (who was the one who actually granted me commit rights, essentially for making sense while arguing with him about PEP 340/346/343).

Getting ready for Pycon was actually also the motivation behind restarting this blog and adding it to Planet Python, finally getting myself a Twitter account (@ncoghlan_dev) and (after getting home) hooking my DISQUS profile up to that. They're all aspects of taking a bit more of an active part in the wider Python community after getting a taste of it at PyconAU last year (despite the fact that I have yet to make it to a BrisPy meeting... Wednesday night just isn't a good night for me these days).

From a more technical perspective, there were a few things that I found particularly interesting:

1. PyPy is definitely starting to make the transition from "experimental platform for research into dynamic language optimisation" to "let's use this to make production code go faster". This shows not only in their benchmark results, but also in their efforts to update their website to be more production-user friendly and the effort to get more major projects running on it at the sprints, including those that stress the boundaries of the distinction between the language definition and CPython implementation details (*cough*SQLAlchemy*cough*). One of those efforts actually revealed a bug in one of the dark corners of the CPython implementation (folks in Hanover F at the sprints may have heard me swearing about my various attempts at fixing that one...)

2. There is definite interest in supporting Python 3 in more modules and packages, as well as improving the available information out there regarding published packages. There's likely to be at least one after-the-fact PEP to better explain one of the major C API changes that bit some sprinters attempting to forward port zc.buildout (I think that was the affected package), there is collaboration developing amongst the Linux distros (and others) to get more existing packages on Python 3 (join the python-porting list if that project interests you), there are a couple of new sites with improved information on the level of Python 3 support in various modules, and the team behind djangopackages are working on providing the same service for the whole of PyPI (and, no doubt, Python 3 support will end up being one of the points of comparison).

3. With distutils2 entering the standard library as "packaging" in 3.3 (to reflect the scope creep in the mission of the package, as well as to avoid name conflicts with future backports of distutils2 post 3.3 release), it was fascinating listening to the sprinters discussing how to take their clean 3.3 code from the standard library and backport it (as distutils2) to run on 3.2, 3.1, 2.7, 2.6, 2.5 and 2.4 without cluttering the stdlib version with backwards compatibility cruft. If their results are a match for their goals, then their new 2to3 and 3to2 inspired tool may end up becoming the basis for a general purpose Python "backport" transformation technique that is able to iteratively downgrade Python code to support earlier versions, while still allowing the use of clean, idiomatic code in the latest version.

4. The understanding of how best to leverage the Mercurial transition is still evolving on python-dev. My personal opinion has now developed towards one where I hope we will start using more feature clones (rather than branches within the main repository), with the main cpython repository only being used to accept feature-complete (or near complete) contributions. We're actually pretty close to supporting that model now, it just needs a few tweaks to the way builds are pushed to the buildbots to get us the rest of the way to being able to trial code on the full buildbot fleet without having to push it into the main repository first.

5. Collaboration efforts between the 5 biggest Python implementations (CPython, PyPy, Jython, IronPython, Stackless) continue to evolve. The PSF announced $10k in direct funding to PyPy at the start of the conference proper, the main python.org download page now includes links to the sites of the other 4 major implementations, more contributors to the other projects were given CPython push rights to allow standard library fixes to be contributed upstream rather than maintained in downstream forks, there are plans in progress to create a collaborative "speed.python.org" modelled on the existing PyPy benchmark site and Brett Cannon plans to revive the PEP about splitting the standard library out to a separate source control repository.

6. Brian Curtin has a nice write-up welcoming several newcomers that made their first submissions to the CPython bug tracker at the sprints. Brett's idea of improving test coverage as an introductory activity is a *great* idea, since such changes are relatively easy to get started with, relatively easy to review, low risk of breaking anything (except the buildbots) and involve actually writing code. I'll also note here that Eric Snow spent the sprints working on a more esoteric idea that came out of a python-ideas discussion: see what would be involved in providing a variant on exec() that allowed an actual function body to be executed in a designated namespace.

I was also surprised (and somewhat concerned) at the number of people that perceived python-dev as a hostile, unwelcoming place. On further reflection, I realised there was actually some merit to that point of view, but that's a topic for another post.

Python Language Summit - Highlights

The language summit covered a fair bit more ground than the VM summit did, so this post only covers the topics that I personally found particularly interesting. My rough notes at least mention everything that was discussed.

Moar Speed
The question of speed, optimisations and benchmarking came up again. A lot of this was just retreading the ground from the VM summit with a wider audience. One thing that did become clearer is that the near-term points of contact for the speed.python.org concept are Maciej Fijałkowski from the PyPy team for the technical issues and Jesse Noller on the PSF side for the hosting/organisational issues (although I expect Jesse would welcome offers of assistance if anyone else, particularly existing PSF members, wanted to step up and help coordinate things).

Communicating the collective opinions of python-dev
Where are we going, what are we doing? The main communications channels for python-dev have historically been PEPs (including release PEPs), the What's New document for each release and of course the mailing list itself. The python-dev summaries project has been tried a couple of times, but generally burned out the people involved.

Doug Hellman (PSF communications officer) would like to try a setup where there is an official python-dev technical blog where major discussions and decisions (including the outcomes of PEPs) can be presented in easier to swallow chunks, giving the gist of significant decisions and discussions, with references back to the source PEPs and mailing list threads.

It's an interesting idea, but, as Guido pointed out, will likely require *new* people to step forward to do it that are interested in the idea of helping to provide a window into the goings-on of python-dev (hopefully the more interesting parts, where we aren't just arguing about the colour of the current bikeshed du jour). From a personal point of view, I know I've only just really started using *this* blog to talk about my own perspective on Pythonic stuff. Something that may be practical is for the python.org technical blog to highlight blog posts where existing core devs are talking about new and upcoming stuff on their personal blogs.

Doug Hellman is the point of contact for anyone interested in following up on this.

Policy on use of accelerator modules
There are a few unwritten policies regarding the use of implementation-specific accelerator modules to speed up parts of the standard library (such as "always test both versions", "the accelerated version should be API compatible with the Python version", "the interpreter should still work if the accelerated version is missing").

Brett Cannon has volunteered to write these down in an official policy PEP. While CPython is likely the main offender here, it will be suggested that other implmentations follow the same policy for their own accelerator modules. Patches to bring CPython more inline with this policy, include providing pure Python alternative of existing C-only modules, are definitely of interest.

Compatibility warnings
With the rise in significance of alternate implementations, some grey areas in the language definition (such as the reliance on refcounting semantics, abuse of the ability to store non-string keys in CPython namespaces, storing objects that implement the descriptor protocol in classes without considering the consequences) are potential sources for confusion when they break on other versions (or potentially even in future versions of CPython.

ResourceWarning was added a while back to cover the refcounting issue, and uncovered a few bugs in CPython and its test suite. The proposal is to add CompatibilityWarning as a peer exception to ResourceWarning and use it for cases where people are relying on CPython implementation accidents that aren't officially supported by the language definition.

Nobody has stepped forward to write the PEP for this as yet, but it may make an interesting sprint topic (I know at least Brett and I are around for the sprints, and there should be a few other CPython core devs kicking around).

Better exception info
ImportWarning will likely acquire a "module" attribute during the sprints (this is an easy one, since it will just reference the module's name). There are other expections that could probably do with having the repr() of critical bits of information stored separately on the exception object (e.g. KeyError, ValueError, IndexError) for easy programmatic access.

Using repr() helps avoid issues with reference cycles keeping things along longer than intended. However, the API for creating such enhanced exceptions would still need to be worked out, as well as how best to deal with cases where third party code has only populated the exception message without filling in the specific details. Technically, even ImportError isn't immune to that concern, as raising it is sometimes the responsibility of third party PEP 302 importers and loaders.

Python Language Summit - Rough Notes

Same drill, different day, more people, more notes :)

Still just my interpetation, though. Will probably highlight a few things I find particularly interesting again tomorrow (as I did for the VM summit).

PSF Communications (Doug Hellman)
- currently writing about PSF funding and similar activities
- would like to include more technical material summarising python-dev discussions
- how best to go about that
- new blog, not existing PSF blog
- existing PSF board not in the position to do it
- Guido: core devs already do a lot via PEPs and mailing list, likely not keen to write blog as well
- may be better to get others to do it, willing to follow discussions of interest
- posts may be primarily pointers to other resources (e.g. PEPs, mailing list posts)
- all implementations
- major new releases should go on python.org as NEWS items

Warnings for things that may cause problems on different implementations
- ResourceWarning helps to pick up reliance on CPython specific refcounting
- CompatibilityWarning for reliance on non-strings in namespaces (e.g. classes)
- update Language Spec to clarify ambiguous situations
- like ResourceWarning, silence CompatibilityWarning by default
- what to do about builtin functions that currently aren't descriptors (i.e. doesn't change behaviour when retrieved from a class)
- e.g. make staticmethod objects directly callable
- big gray area in language spec - callables may not be descriptors
- perhaps change CPython builtin descriptors to warn about this situation
- another use case for CompatibilityWarning
- Guido not convinced the builtin function problem can be handled in general
- a better callable variant of staticmethod may be better, that allows the default descriptor behaviour to be easily stripped from any function
- doesn't want to require that all builtin functions follow descriptor protocol, since it is already the case that many callables don't behave like methods
- a better staticmethod would allow the descriptor protocol to be stripped, ensuring such functions can be safely stored in classes without changing behaviour

Standard Library separation
- see VM summit notes
- over time, migrate over to separate repository for standard lib, update
- need Python and C modules stay in sync
- buildbots for standard library
- challenge of maintaining compatibility as standard lib adopts new language changes
- need a PEP to provide guarantees that C accelerators are kept in sync (Brett Cannon volunteered to write test)
- bringing back pure Python alternatives to C standard library is encouraged, but both need to be tested
- accelerator modules should be subsets of the Python API
- Brett will resurrect standard library PEP once importlib is done
- full consolidation unlikely to be possible for 2.7 (due to CPython maintenance freeze)

Speed Benchmarking
- see VM summit notes
- really good for tracking performance changes across versions
- common set of benchmarks
- OSU OSL are willing to host it
- backend currently only compares two versions
- first step is to get up and running with Linux comparisons first, look at other OS comparisons later
- hypervisors mess with performance benchmarks, hence need real machines
- should set up some infrastructure on python.org (benchmark SIG mailing list, hg repository)
- eventually, redirect speed.pypy.org to new speed.python.org
- longer term, may add new benchmarks

Exception data
- need to eliminate need to parse error strings to get info from exceptions
- should be careful that checks of message content aren't overly restrictive
- PEP 3151 to improve IO error handling? (Guido still has some reservations)
- ImporError needs to name module
- KeyError, IndexError, ValueError?
- need to be careful when it comes to creating reference loops
- exception creation API also an issue, since structured data needs to be provided

Contributor Licensing Agreements
- Jesse and Van looking to get electronic CLAs set up
- will ensure adequately covers non-US jurisdictions

Google Summer of Code
- encouraging proposals under the PSF umbrella

Packaging
- distutils2 should land in 3.3 during the sprints
- namespace packages (PEP 382) will land in 3.3
- external name for backports should be different from internal name
- too late to introduce a standard top level parent for stdlib packages
- external backports for use in older versions is OK
- external maintenance is bad
- hence fast development cycles incompatible with stdlib
- want to give distutils2 a new name in stdlib for 3.3, so future backports based on version in 3.4 won't conflict with the standard version in 3.3

Python 3 adoption
- py3ksupport.appspot.com (Brett Cannon)
- supplements inadequate trope data on PyPI with manual additions
- Georg Brandl has graphical tracker of classification data on PyPI over time
- Allison Randall/Barry Warsaw have been doing similar dependency tracking and migration info for Ubuntu
- giant wiki page for Fedora Python app packaging
- good dependency info would provide a good ranking system for effectively targeting grants
- 3.python.org? getpython3.com? need to choose an official URL
- funding may help with PyPy migration
- IronPython will be looking at 3.x support once 2.7 is available (this week/next week timeframe)
- Jython focused on 2.6 now, may go direct to 3.x after that (haven't decided yet)
- PSF funding needs a specific proposal with specific developer resources with the necessary expertise and available time
- CObject->Capsule change is a compatibility issue for C extension modules
- Django targeting Python 3 support by the end of summer
- zc.buildout is a dependency of note that hasn't been ported yet (Pycon sprint topic)
- other migration projects being tackled at Pycon sprints (webop?)

Python upstream and distro packaging
- PEP 394 - recommendations for symlinks practices
- PEP 3147 and 3149 were heavily targeted at helping distros share directories across versions
- namespace packages (PEP 382)
- PEP 384 stable ABI (done for 3.2)
- better tools needed to help with migration to stable ABI

Baseline Python distro installs
- system python varies in terms of what is installed
- challenging to target, as available modules vary
- "build from source" is only a partial answer as some build dependencies are optional
- distros make some changes to support differences in directory layouts
- some changes affect Python app dependencies (e.g. leaving out distutils)
- conflict between "system Python" use case of what is needed to run distro utilities and "arbitrary app target" for running third party apps
- distributing separate Python under app control is not ideal, due to security patch management issues
- specific problems are caused by removal of stuff from base install (e.g. distutils)
- other problem is when distro uses old versions of packages (but virtualenv can help with that)
- may help if a "python-minimal" was used for the essential core, with "python" installing all the extras (including distutils, tkinter, etc)
- then have a further python-extras (or equivalent) that adds everything else the distro needs for its own purposes
- distros tend to work by taking a CPython build and then splitting it up into various distro packages
- to handle additions, would be good to be able to skip site-packages inclusion in sys.path (ala virtualenv).
- "-S" turns off too much (skips site.py entirely, not just adding site-packages to sys.path)
- "-s" only turns off user site-packages, not system site-packages

Python 3.3 proposed changes to strings to reduce typical memory usage
- PEP 393 changes to internal string representation (implementation as GSoC project)
- Unicode memory layout currently split in order to more easily support resizing and subclassing in C
- need to build and measure to see speed and memory impacts
- alternative idea may be to explore multiple implementation techniques (similar to PyPy)

Speed (again!)
- Unladen Swallow dormant. Major maintainers moved on to other things, fair bit of work in picking it up
- even trying to glean piecemeal upgrades (e.g. to cPickle) is a challenge
- interest in speeding up Python has really shifted to PyPy
- for CPython, gains would need to be really substantial to justify additional complexity
- really need to get the macro benchmarks available on 3.x
- Guido: pickle speedup experience is to be cautious, even when the speed gains are large.
- speed hack attempts on CPython are still of interest, especially educational ones
- speeding up overall is a very hard problem, but fixing specific bottlenecks is good
- stable ABI will help
- PyPy far more sensitive to refcounting bugs than CPython
- static analysis to pick up refcounting bugs could help a great deal
- "Here there be dragons": Unladen Swallow shows that overall speedups are not easy to come by

Regex engine upgrade
- new regex library proposed
- added many new features, including the Unicode categories needed to select out Python 3.x identifiers
- potentially big hassle for other implementations since re module includes a lot of C
- IronPython currently translates to .NET compatible regexes, but could rewrite more custom code

GUI Library
- Guido: GUI libraries are nearly as complicated as the rest of Python put together and just aren't a good fit with the release cycle of the standard lib
- Don't want to add another one, but don't want to remove Tcl/Tk support either

twisted.reactor/deferred style APIs in the standard library
- asyncore/aynchat still has users
- would like to have an alternative in the stdlib that offers a better migration path to Twisted
- deferred could be added, such that asyncore based apps can benefit from it
- reactor model separates transport/protocol concerns far more cleanly than asyncore
- protocol level API and transport level API for asyncore may be a better option
- would allow asyncore based applications to more easily migrate to other async loops
- defining in a PEP would allow this to be the "WSGI" for async frameworks ("asyncref", anyone?) (Jesse suggested concurrent.eventloop instead)
- still need someone to step up to write the PEP and integrate the feedback from the Twisted team and the other async frameworks
- plenty of async programming folks able to help and provide feedback (including glyph)
- having this standardised would help make event loop based programming more pluggable
- Guido still doesn't like the "deferred" name
- Glyph considers deferred to be less important than standardising the basic event loop interface

Python VM Summit - Somewhat Coherent Thoughts

Yay, sleep :)

Last night I just dumped my relatively raw notes into a post. This review is more about distilling what was discussed over the day into a few key themes.

Speed Good

One major point was to do with "How do we make Python fast?". Dave Mandelin (Mozilla Javascript dev) was asking how open CPython was to people tinkering with JIT and other technologies to try and speed up execution, and it was acknowledged that python-dev's reaction to such proposals is rarely more than lukewarm. A large part of that resistance comes from the fact that CPython is generally portable to many more architectures than the real speed hacks (which are generally x86 + x86-64 + ARM at best, and sometimes not even all 3 of those). Unladen Swallow also lost a lot of steam, as so much of their effort was going into tasks not directly related to "make CPython faster" (e.g. fixing LLVM upstream bugs, getting benchmarks working on new versions).

Instead, we tend to push people more towards PyPy if they're really interested in that kind of thing. Armin decided years ago (when switching his efforts from psyco to PyPy) that "we can't get there from here", and it's hard to argue with him, especially given the recent results from the benchmarks executed by speed.pypy.org.

There was definitely interest in expanding the speed.pypy.org effort to cover more versions of more interpeters. We don't actually have any solid data in CPython regarding the performance differences between 2.x and 3.x (aside from an expectation that 3.x is slower for many workloads due to the loss of optimised 32 bit integers, additional encoding/decoding overhead when working with ASCII text, the new IO stack, etc). We aren't even sure of the performance changes within the 2.x series.

That last is the most amenable to resolution in the near term - the benchmarks run by speed.pypy.org are all 2.x applications, so the creation of a speed.python.org for the 2.x series could use the benchmarks as is. Covering 3.x as well would probably be possible with a subset of the benchmarks, but others would require a major porting effort (especially the ones that depend on twisted).

Champions and specific points of contact for this idea aren't particularly obvious at this stage. Jesse is definitely a fan of the idea, but has plenty on his plate already, so it isn't clear how that will work out from a time point of view. There'll likely need to be some self-organisation from folks that are both interested in the project and aren't already devoting their Python-relates energies to something else.

The Python Software Foundation, not the CPython Software Foundation

The second major key point was the PSF (as represented by Jesse Noller from the board, and several other PSF members, including me, from multiple VMs) wanting to do more to support and promote implementations other than CPython. We are definitely at the point where all 4 big implementations are an excellent choice depending on the target environment:

  • CPython: the venerable granddaddy, compatible with the most C extensions and target environments, most amenable to "stripping" (i.e. cutting it down to a minimal core), likely the easiest sell in a corporate environment (due to age and historically closest ties to the PSF)
  • Jython: the obvious choice when using Python as a glue language for Java components, or as a scripting language embedded in a Java environment
  • IronPython: ditto for .NET components and applications
  • PyPy: now at the point where deployments on standard server and desktop environments should seriously consider it as an alternative to CPython. It's not really appropriate for embedded environments, but when sufficient resources are available to let it shine, it will run most workloads significantly faster than CPython. It even has some support for C extensions, although big ticket items like full NumPy support are still a work in progress. However, if you're talking something like a Django-based web app, then "CPython or PyPy" is now becoming a question that should be asked.

It didn't actually come up yesterday, but Stackless probably deserves a prominent mention as well, given the benefits that folks such as CCP are able to glean from the microthreading architecture.

Currently, however, python.org is still very much the CPython website. It will require a lot of work it to get to a place where the other implementations are given appropriate recognition. It also isn't clear whether or not the existing pydotorg membership will go along with a plan to modernise the website design to something that employs more modern web technologies, and better provides information on the various Python implementation and the PSF. While the current site is better than what preceded it, a lot of pydotorg members are still gun shy due to the issues in managing that last transition (even the recent migration of the development process docs over to a developer-mainted system on docs.python.org encountered some resistance). However, when the broader Python community includes some of the best web developers on the planet, we can and should do better. (A personal suggestion that I didn't think of until this morning: perhaps a way forward on this would be to first build a new site as "beta.python.org", without making a firm commitment to switch until after the results are available for all to see. It's a pretty common way for organisations to experiment with major site revamps, after all, and would also give the pydotorg folks a chance to see what they think of the back-end architecture)

Standardising the Standard Library

Finally, with the hg transition now essentially done, efforts to better consolidate development effort on the standard library (especially the pure Python sections) and the associated documentation will start to gather steam again. As a preliminary step, commit rights (now more accurately called "push rights") to the main CPython repository are again being offered to maintainers from the other major interpreter implementations so they can push fixes upstream, rather than needing to maintain them as deltas in their own repositories and/or submit patches via the CPython tracker.