Posts/2011/03

Climate change "skepticism": a text book case of the "Tragedy of the Commons"

So, with the government moving to implement a carbon tax, the issue of climate change and attempts to mitigate it are once again a big deal here in Australia.

Inspired by a friend's comment comparing carbon taxes with fines for littering, along with a few of the responses to that comment, I started pondering the similarities and differences between the current attempts to deal with carbon dioxide emissions, and the effective cost internalisation schemes created in the 1990's that severely reduced industrial sulphur dioxide emissions, as well as various current laws that prohibit dumping of toxic waste in developed nations (and their unintended side effects).

(Fair warning before I wade in: I've simplified quite a few things, particularly on the acid rain front. This post is long enough as it is, without making it even longer with all the relevant caveats about other causes of acid rain and any other details I've glossed over as being irrelevant to the main point of the article. It's a general overview, not a peer reviewed scientific paper)

A tale of two gases
The key similarity between sulphur dioxide induced acid rain and carbon dioxide induced global warming is hopefully fairly obvious: they both represent a classic economic "externality", a cost borne by someone other than the person responsible for causing it. The industries and individuals emitting these pollutants don't suffer any immediate harmful consequence through the normal action of nature.

In situations like that, regulation in one form or another is the only effective tool we have to ensure that at least some portion of those external costs is reflected back on the responsible individuals. As noted in the link above, this approach proved extraordinarily effective in reducing acid rain in the US, drastically cutting sulphur emissions at a fraction of the originally predicted cost.

However, there are a few key differences that have contributed to the discussion over carbon dioxide taking a rather different path to that over sulphur dioxide:

  1. It's hard to deny the harmful effects of sulphur dioxide emissions when plants, fish and insects are dying due to the excessive levels of acidity in rain and ground water. By contrast, it is very hard to convey clearly why a small rise in average global temperatures is going to be such a bad thing, or even how human carbon dioxide emissions contribute to that happening.
  2. Acid rain from sulphur emissions is generally a local problem with local causes (from a continental perspective, anyway). Stop emitting sulphur dioxide in the US and acid rain stops falling in the US. Significantly reduce carbon dioxide emissions in Australia (or even the US) though, and we're likely still screwed if other countries don't follow suit.
  3. The "sulphur" cycle (how long it takes for the sulphur emissions to be deposited back on the ground as acid rain) is significantly shorter than that of carbon, so efforts to reduce emissions will have an effect on water acidity levels in a reasonably short time frame
Comparisons with anti-waste dumping laws are similar: the effects of toxic waste dumping are generally both local, obvious and able to be cleaned up within years rather than decades, so it is easy to get support for laws prohibiting such practices, even if the end result of those laws is just to export the problem to a poorer country.

Manufacturing doubt and exploiting the "Grey Fallacy"
Acid rain is an easy problem to sell: the acid damages plants and objects directly, so the harm can be easily conveyed through pictures and videos. The problem of climate change, though is far, far harder to illustrate simply, since it is a signal buried in some very noisy weather patterns.

A fundamental issue is the fact that people don't really experience climate, but instead experience weather. The day to day and seasonal variations in the weather massively exceed the scale of the underlying trends being discussed by climate scientists. The fact that predicting the weather in a few days time is harder than predicting long term climate trends is actually massively counterintuitive, even though almost all scientists and engineers are familiar with the fact that random fluctuations at a small scale may average out to something almost perfectly predictable at a large scale (compare the unpredictable nature of quantum physics with the straightforward determinism of classical Newtonian mechanics).

We're also not really used to the idea of directly affecting the balance of the planet on a global scale. In absolute terms, the effects of the sun, the oceans and the plants on the carbon cycle all dwarf the human contribution. The idea that the non-human forcings were actually largely in balance with each other, and that the comparatively small human contribution is enough to tip that balance to the point of producing more carbon than can be absorbed by natural mechanisms is really quite a subtle one.

The scientific background needed to truly understand and come to grips with the hows and the whys of the IPCC predictions reminds me of this comic regarding the knowledge needed to even begin to understand the vagaries of string theory (make sure to click through on the comic itself to see the whole chain of images).

If there wasn't anyone around with a vested interest in maintaining the status quo (at least for a while longer), this likely wouldn't be a problem. Scientists, politicians, economists and engineers could proceed with the development of mitigation strategies, while also attempting to educate the general populace as to the reasons behind any actions taken. Since such vested interests do exist, however, their opposition makes it significantly harder to convince the lay public that there is a real problem here.

Because climate science is such a complex topic, it is actually quite hard to explain to a non-scientist just why the scientific consensus is as strong as it is. If you oversimplify, you veer into territory where statements are so oversimplified that they border on being false. However, if you're coming from the other angle and want to persuade people that the science "isn't settled", then you're no longer constrained by the need to be accurate and can just go with whatever sounds more intuitively plausible and/or better caters to people's natural inclinations and prejudices.

Sites like Skeptical Science do their best to clearly explain why the oft-repeated "skeptic" arguments are basically BS (and do it well), but to someone only following the Cliff Notes mainstream media version of the debate, the temptation is still very, very strong to just assume the truth lies somewhere between the two positions being debated.

Telling people to "do their own research" doesn't really help all that much in practice. Telling the BS from the valid science is itself a fine art that takes a great deal of skill and experience. Being a veteran of arguments with creationists (and even intelligent design advocates) is actually quite beneficial, since the global warming "skeptics" use many of the same rhetorical tricks as creationists do when attempting to deny the fact of evolution (incessant use of such deceptive tactics is actually one of the major hints that someone is trying to sell you a line rather than just stating the truth as they see it). The most common tactic used by both groups is to drag out a thoroughly rebutted argument for each new audience, in hopes that they aren't aware of the existence of a rebuttal.

For example, just as most people can't answer "How could a bombardier beetle possibly evolve?" off the top of their heads - I've actually forgotten all of the plausible answers myself - neither can they answer questions like "If the climate is supposed to be warming, why is it colder now than it was during the Middle Ages?". While that one is actually fairly straightforward to answer (ice cores and other data shows that the medieval warming was likely localised to Europe due to various currents in the Atlantic, but this merely shifted heat around, so that other parts of the world were correspondingly colder), there are dozens of other oft-repeated thoroughly debunked arguments, and it's basically impossible for a mere interested observer to remember them all. As it turns out, I had actually misremembered the correct explanation for the Medieval warm period, so the point above isn't quite right and invites an attack on the grounds that I don't know what I'm what I'm talking about. To some extent that's actually true - my opinion on climate science issues is based on a meta-analysis of the trustworthiness of various information sources (including the full IPCC reports), since I'm not inclined to spend a few decades studying up and redoing all the science myself. Fortunately, Skeptical Science has the full story for this question and many others (the MWP was actually colder than the latter half of this century, despite higher solar activity and lower volcanic activity), so correcting my error is the task of a few moments. And if you're still wondering about the bombardier beetle thing, TalkOrigins has the full story for that, too.

Fortunately, at the decision making level here in Australia, this part of the debate seems to be coming to a close, with even Toby Abbott (the leader of the opposition) at least paying lip service to the fact that human-induced increases in average global temperatures are now a fact of life. However, the mainstream media is still happy to "teach the controversy" in the hopes of picking up a few more eyeballs (or ears) to sell to their advertisers.

But the problem is so much bigger than us!
However, even once you get agreement that human-induced global warming due to excessive carbon dioxide emissions is a genuine problem, you then run into the "But we can't do anything about it!" attitude.

Many Australians (including some elected members of our Federal parliament) go "Oh, but the US/Europe/Brazil/Russia/India/China have a much bigger impact than we do. Doing anything will hurt our international competitiveness without really achieving anything, so we shouldn't bother."

Even the higher emission countries point fingers at each other. The US won't budge until China does. The BRIC countries are waiting for the US to make a move, and use US inaction as justification for similarly doing nothing in the meantime.

An absolutely textbook "Tragedy of the Commons" reaction. And we know how that story ends, don't we? In the long run everybody loses, nobody wins.

How do you fix it? Rules, regulations and social pressure. The "community of nations" isn't just words. The complex web of interdependencies that spans the world gives countries real power to exert influence on each other. Once some countries start to make a move towards serious carbon emission control strategies, then that becomes a bargaining chip to use in international negotiations. Will it work? Who knows. The only thing we know for sure is that the more carbon we pump into the atmosphere over the next few decades, the higher the impact from global warming will be (it's now guaranteed that there will be some impact - the carbon cycle is too long for us to prevent that at this late stage of the game).

Ideally you would want to adopt an emissions trading scheme along the lines of that used to curb sulfur emissions, but the wholehearted embrace of dodgy pseudoscience by far too many members of our Federal Opposition party spiked that particular barrel.

So a carbon tax it is (for now, anyway). The hysterical cries of "Oh my god, you've doomed the country!" ring loudly from those that will bear most of the direct costs of actually internalising the negative effects their carbon emissions have on the wider environment. Their motives are, to say the least, a little bit suspect. The political opportunism involved in our Federal Opposition leader backing them is disappointing, but unsurprising.

In the PM's words
There's a nice summary of the current situation in a Fairfax opinion piece that went out over Gillard's byline:
The longer we leave it the harder action on climate change gets. This reform road is a hard one to walk. Just as doing nothing is not an option, we need to be careful to ensure that we do not make decisions that will cost our economy and jobs.
There's a definite risk that the government's plans won't live up to their ambitions. However, the world's past experience with the sulfur dioxide doomsayers should arouse some deep skepticism and not directed towards those wanting to press forward with plans to curb carbon emissions.

Queensland's "unelected" leader of the opposition

Following the declaration of Brisbane's Lord Mayor that he is running for state parliament at the next election, and will be the leader of the opposition for that campaign, there is a trope making the rounds that he is somehow "unelected".

Out of curiosity, I decided to see how his numbers in the Brisbane City Council elections in 2008 stacked up against Anna Bligh's numbers in the 2009 state election.

The internet being what it is, official sources for these numbers weren't too hard to dig up:
I found the Lord Mayoral results on the Brisbane City Council site
I found the Electorate of South Brisbane results on the Electoral Commission Qld site (with the preferences taken into account here).

Results:
Campbell Newman received 335,076 first preference votes for Lord Mayor (60% of the total).
After distribution of preferences, he received a total of 339,320 votes (66% of the total).
It's probably worth noting that the second placed candidate received 29% of the first round votes, increasing to 34% after distribution of preferences, so Newman actually did receive the majority of the remaining preferences, there just weren't many to go around.

Anna Bligh received 12,243 first preference votes in her electorate of South Brisbane (48% of the total).
After distribution of preferences, she received a total of 14,697 votes (65% of the total).

So, our State Premier holds her position on the back of the support of her party and around 15k people living in or near South Brisbane.

The new leader of the opposition will hold that position on the back of the support of his party and around 340k people living in the City of Brisbane.

Since the current population of Qld is estimated at 4.5 million people, effectively 7.5% of the state voted for Newman to be Brisbane's Lord Mayor, while only 0.3% voted for Bligh to represent the seat of South Brisbane.

And people are calling Newman an unelected leader?

(Yes, I'm aware that the nature of parliamentary governments based on geographic representation means that most constituents don't get to vote directly for their parliamentary leaders. The only purpose of this post is to point out how dumb that makes the "unelected" gibe sound when it is aimed at the directly elected Lord Mayor of a city the size of Brisbane)

Thoughts and Impressions following PyCon 2011

I'm back home following my inaugural trip to PyCon, so it seems like a good time to record my impressions of the summits, conference and sprints.

I really enjoyed the whole experience - kudos to Van Lindbergh, Jesse Noller and the rest of the volunteer team for organising everything. I'm glad to see the "apprenticeship" setup for the conference leadership continuing, with Jesse (the deputy coordinator this year) stepping up to coordinate Santa Clara, with the future coordinator for Montreal assisting during those two years.

The personal connection side of things was brilliant. When it comes to the folks that were already on python-dev before I really started to follow it back in late 2003, I've been interacting and working with them online for 8+ years, and there are of course many others that have joined python-dev in the intervening time. I'd seen a few of them in photos and videos, and heard a few in videos and podcasts, but by and large, this was the first time I had been able to connect faces and voices to names on the screen. Very cool stuff, including getting to meet Raymond Hettinger (who accepted my first patches back in '04, among them the looks-insane-but-it-works string hack to speed up the 2.x decimal module) and of course Guido himself (who was the one who actually granted me commit rights, essentially for making sense while arguing with him about PEP 340/346/343).

Getting ready for Pycon was actually also the motivation behind restarting this blog and adding it to Planet Python, finally getting myself a Twitter account (@ncoghlan_dev) and (after getting home) hooking my DISQUS profile up to that. They're all aspects of taking a bit more of an active part in the wider Python community after getting a taste of it at PyconAU last year (despite the fact that I have yet to make it to a BrisPy meeting... Wednesday night just isn't a good night for me these days).

From a more technical perspective, there were a few things that I found particularly interesting:

1. PyPy is definitely starting to make the transition from "experimental platform for research into dynamic language optimisation" to "let's use this to make production code go faster". This shows not only in their benchmark results, but also in their efforts to update their website to be more production-user friendly and the effort to get more major projects running on it at the sprints, including those that stress the boundaries of the distinction between the language definition and CPython implementation details (*cough*SQLAlchemy*cough*). One of those efforts actually revealed a bug in one of the dark corners of the CPython implementation (folks in Hanover F at the sprints may have heard me swearing about my various attempts at fixing that one...)

2. There is definite interest in supporting Python 3 in more modules and packages, as well as improving the available information out there regarding published packages. There's likely to be at least one after-the-fact PEP to better explain one of the major C API changes that bit some sprinters attempting to forward port zc.buildout (I think that was the affected package), there is collaboration developing amongst the Linux distros (and others) to get more existing packages on Python 3 (join the python-porting list if that project interests you), there are a couple of new sites with improved information on the level of Python 3 support in various modules, and the team behind djangopackages are working on providing the same service for the whole of PyPI (and, no doubt, Python 3 support will end up being one of the points of comparison).

3. With distutils2 entering the standard library as "packaging" in 3.3 (to reflect the scope creep in the mission of the package, as well as to avoid name conflicts with future backports of distutils2 post 3.3 release), it was fascinating listening to the sprinters discussing how to take their clean 3.3 code from the standard library and backport it (as distutils2) to run on 3.2, 3.1, 2.7, 2.6, 2.5 and 2.4 without cluttering the stdlib version with backwards compatibility cruft. If their results are a match for their goals, then their new 2to3 and 3to2 inspired tool may end up becoming the basis for a general purpose Python "backport" transformation technique that is able to iteratively downgrade Python code to support earlier versions, while still allowing the use of clean, idiomatic code in the latest version.

4. The understanding of how best to leverage the Mercurial transition is still evolving on python-dev. My personal opinion has now developed towards one where I hope we will start using more feature clones (rather than branches within the main repository), with the main cpython repository only being used to accept feature-complete (or near complete) contributions. We're actually pretty close to supporting that model now, it just needs a few tweaks to the way builds are pushed to the buildbots to get us the rest of the way to being able to trial code on the full buildbot fleet without having to push it into the main repository first.

5. Collaboration efforts between the 5 biggest Python implementations (CPython, PyPy, Jython, IronPython, Stackless) continue to evolve. The PSF announced $10k in direct funding to PyPy at the start of the conference proper, the main python.org download page now includes links to the sites of the other 4 major implementations, more contributors to the other projects were given CPython push rights to allow standard library fixes to be contributed upstream rather than maintained in downstream forks, there are plans in progress to create a collaborative "speed.python.org" modelled on the existing PyPy benchmark site and Brett Cannon plans to revive the PEP about splitting the standard library out to a separate source control repository.

6. Brian Curtin has a nice write-up welcoming several newcomers that made their first submissions to the CPython bug tracker at the sprints. Brett's idea of improving test coverage as an introductory activity is a *great* idea, since such changes are relatively easy to get started with, relatively easy to review, low risk of breaking anything (except the buildbots) and involve actually writing code. I'll also note here that Eric Snow spent the sprints working on a more esoteric idea that came out of a python-ideas discussion: see what would be involved in providing a variant on exec() that allowed an actual function body to be executed in a designated namespace.

I was also surprised (and somewhat concerned) at the number of people that perceived python-dev as a hostile, unwelcoming place. On further reflection, I realised there was actually some merit to that point of view, but that's a topic for another post.

Python Language Summit - Highlights

The language summit covered a fair bit more ground than the VM summit did, so this post only covers the topics that I personally found particularly interesting. My rough notes at least mention everything that was discussed.

Moar Speed
The question of speed, optimisations and benchmarking came up again. A lot of this was just retreading the ground from the VM summit with a wider audience. One thing that did become clearer is that the near-term points of contact for the speed.python.org concept are Maciej Fijałkowski from the PyPy team for the technical issues and Jesse Noller on the PSF side for the hosting/organisational issues (although I expect Jesse would welcome offers of assistance if anyone else, particularly existing PSF members, wanted to step up and help coordinate things).

Communicating the collective opinions of python-dev
Where are we going, what are we doing? The main communications channels for python-dev have historically been PEPs (including release PEPs), the What's New document for each release and of course the mailing list itself. The python-dev summaries project has been tried a couple of times, but generally burned out the people involved.

Doug Hellman (PSF communications officer) would like to try a setup where there is an official python-dev technical blog where major discussions and decisions (including the outcomes of PEPs) can be presented in easier to swallow chunks, giving the gist of significant decisions and discussions, with references back to the source PEPs and mailing list threads.

It's an interesting idea, but, as Guido pointed out, will likely require *new* people to step forward to do it that are interested in the idea of helping to provide a window into the goings-on of python-dev (hopefully the more interesting parts, where we aren't just arguing about the colour of the current bikeshed du jour). From a personal point of view, I know I've only just really started using *this* blog to talk about my own perspective on Pythonic stuff. Something that may be practical is for the python.org technical blog to highlight blog posts where existing core devs are talking about new and upcoming stuff on their personal blogs.

Doug Hellman is the point of contact for anyone interested in following up on this.

Policy on use of accelerator modules
There are a few unwritten policies regarding the use of implementation-specific accelerator modules to speed up parts of the standard library (such as "always test both versions", "the accelerated version should be API compatible with the Python version", "the interpreter should still work if the accelerated version is missing").

Brett Cannon has volunteered to write these down in an official policy PEP. While CPython is likely the main offender here, it will be suggested that other implmentations follow the same policy for their own accelerator modules. Patches to bring CPython more inline with this policy, include providing pure Python alternative of existing C-only modules, are definitely of interest.

Compatibility warnings
With the rise in significance of alternate implementations, some grey areas in the language definition (such as the reliance on refcounting semantics, abuse of the ability to store non-string keys in CPython namespaces, storing objects that implement the descriptor protocol in classes without considering the consequences) are potential sources for confusion when they break on other versions (or potentially even in future versions of CPython.

ResourceWarning was added a while back to cover the refcounting issue, and uncovered a few bugs in CPython and its test suite. The proposal is to add CompatibilityWarning as a peer exception to ResourceWarning and use it for cases where people are relying on CPython implementation accidents that aren't officially supported by the language definition.

Nobody has stepped forward to write the PEP for this as yet, but it may make an interesting sprint topic (I know at least Brett and I are around for the sprints, and there should be a few other CPython core devs kicking around).

Better exception info
ImportWarning will likely acquire a "module" attribute during the sprints (this is an easy one, since it will just reference the module's name). There are other expections that could probably do with having the repr() of critical bits of information stored separately on the exception object (e.g. KeyError, ValueError, IndexError) for easy programmatic access.

Using repr() helps avoid issues with reference cycles keeping things along longer than intended. However, the API for creating such enhanced exceptions would still need to be worked out, as well as how best to deal with cases where third party code has only populated the exception message without filling in the specific details. Technically, even ImportError isn't immune to that concern, as raising it is sometimes the responsibility of third party PEP 302 importers and loaders.

Python Language Summit - Rough Notes

Same drill, different day, more people, more notes :)

Still just my interpetation, though. Will probably highlight a few things I find particularly interesting again tomorrow (as I did for the VM summit).

PSF Communications (Doug Hellman)
- currently writing about PSF funding and similar activities
- would like to include more technical material summarising python-dev discussions
- how best to go about that
- new blog, not existing PSF blog
- existing PSF board not in the position to do it
- Guido: core devs already do a lot via PEPs and mailing list, likely not keen to write blog as well
- may be better to get others to do it, willing to follow discussions of interest
- posts may be primarily pointers to other resources (e.g. PEPs, mailing list posts)
- all implementations
- major new releases should go on python.org as NEWS items

Warnings for things that may cause problems on different implementations
- ResourceWarning helps to pick up reliance on CPython specific refcounting
- CompatibilityWarning for reliance on non-strings in namespaces (e.g. classes)
- update Language Spec to clarify ambiguous situations
- like ResourceWarning, silence CompatibilityWarning by default
- what to do about builtin functions that currently aren't descriptors (i.e. doesn't change behaviour when retrieved from a class)
- e.g. make staticmethod objects directly callable
- big gray area in language spec - callables may not be descriptors
- perhaps change CPython builtin descriptors to warn about this situation
- another use case for CompatibilityWarning
- Guido not convinced the builtin function problem can be handled in general
- a better callable variant of staticmethod may be better, that allows the default descriptor behaviour to be easily stripped from any function
- doesn't want to require that all builtin functions follow descriptor protocol, since it is already the case that many callables don't behave like methods
- a better staticmethod would allow the descriptor protocol to be stripped, ensuring such functions can be safely stored in classes without changing behaviour

Standard Library separation
- see VM summit notes
- over time, migrate over to separate repository for standard lib, update
- need Python and C modules stay in sync
- buildbots for standard library
- challenge of maintaining compatibility as standard lib adopts new language changes
- need a PEP to provide guarantees that C accelerators are kept in sync (Brett Cannon volunteered to write test)
- bringing back pure Python alternatives to C standard library is encouraged, but both need to be tested
- accelerator modules should be subsets of the Python API
- Brett will resurrect standard library PEP once importlib is done
- full consolidation unlikely to be possible for 2.7 (due to CPython maintenance freeze)

Speed Benchmarking
- see VM summit notes
- really good for tracking performance changes across versions
- common set of benchmarks
- OSU OSL are willing to host it
- backend currently only compares two versions
- first step is to get up and running with Linux comparisons first, look at other OS comparisons later
- hypervisors mess with performance benchmarks, hence need real machines
- should set up some infrastructure on python.org (benchmark SIG mailing list, hg repository)
- eventually, redirect speed.pypy.org to new speed.python.org
- longer term, may add new benchmarks

Exception data
- need to eliminate need to parse error strings to get info from exceptions
- should be careful that checks of message content aren't overly restrictive
- PEP 3151 to improve IO error handling? (Guido still has some reservations)
- ImporError needs to name module
- KeyError, IndexError, ValueError?
- need to be careful when it comes to creating reference loops
- exception creation API also an issue, since structured data needs to be provided

Contributor Licensing Agreements
- Jesse and Van looking to get electronic CLAs set up
- will ensure adequately covers non-US jurisdictions

Google Summer of Code
- encouraging proposals under the PSF umbrella

Packaging
- distutils2 should land in 3.3 during the sprints
- namespace packages (PEP 382) will land in 3.3
- external name for backports should be different from internal name
- too late to introduce a standard top level parent for stdlib packages
- external backports for use in older versions is OK
- external maintenance is bad
- hence fast development cycles incompatible with stdlib
- want to give distutils2 a new name in stdlib for 3.3, so future backports based on version in 3.4 won't conflict with the standard version in 3.3

Python 3 adoption
- py3ksupport.appspot.com (Brett Cannon)
- supplements inadequate trope data on PyPI with manual additions
- Georg Brandl has graphical tracker of classification data on PyPI over time
- Allison Randall/Barry Warsaw have been doing similar dependency tracking and migration info for Ubuntu
- giant wiki page for Fedora Python app packaging
- good dependency info would provide a good ranking system for effectively targeting grants
- 3.python.org? getpython3.com? need to choose an official URL
- funding may help with PyPy migration
- IronPython will be looking at 3.x support once 2.7 is available (this week/next week timeframe)
- Jython focused on 2.6 now, may go direct to 3.x after that (haven't decided yet)
- PSF funding needs a specific proposal with specific developer resources with the necessary expertise and available time
- CObject->Capsule change is a compatibility issue for C extension modules
- Django targeting Python 3 support by the end of summer
- zc.buildout is a dependency of note that hasn't been ported yet (Pycon sprint topic)
- other migration projects being tackled at Pycon sprints (webop?)

Python upstream and distro packaging
- PEP 394 - recommendations for symlinks practices
- PEP 3147 and 3149 were heavily targeted at helping distros share directories across versions
- namespace packages (PEP 382)
- PEP 384 stable ABI (done for 3.2)
- better tools needed to help with migration to stable ABI

Baseline Python distro installs
- system python varies in terms of what is installed
- challenging to target, as available modules vary
- "build from source" is only a partial answer as some build dependencies are optional
- distros make some changes to support differences in directory layouts
- some changes affect Python app dependencies (e.g. leaving out distutils)
- conflict between "system Python" use case of what is needed to run distro utilities and "arbitrary app target" for running third party apps
- distributing separate Python under app control is not ideal, due to security patch management issues
- specific problems are caused by removal of stuff from base install (e.g. distutils)
- other problem is when distro uses old versions of packages (but virtualenv can help with that)
- may help if a "python-minimal" was used for the essential core, with "python" installing all the extras (including distutils, tkinter, etc)
- then have a further python-extras (or equivalent) that adds everything else the distro needs for its own purposes
- distros tend to work by taking a CPython build and then splitting it up into various distro packages
- to handle additions, would be good to be able to skip site-packages inclusion in sys.path (ala virtualenv).
- "-S" turns off too much (skips site.py entirely, not just adding site-packages to sys.path)
- "-s" only turns off user site-packages, not system site-packages

Python 3.3 proposed changes to strings to reduce typical memory usage
- PEP 393 changes to internal string representation (implementation as GSoC project)
- Unicode memory layout currently split in order to more easily support resizing and subclassing in C
- need to build and measure to see speed and memory impacts
- alternative idea may be to explore multiple implementation techniques (similar to PyPy)

Speed (again!)
- Unladen Swallow dormant. Major maintainers moved on to other things, fair bit of work in picking it up
- even trying to glean piecemeal upgrades (e.g. to cPickle) is a challenge
- interest in speeding up Python has really shifted to PyPy
- for CPython, gains would need to be really substantial to justify additional complexity
- really need to get the macro benchmarks available on 3.x
- Guido: pickle speedup experience is to be cautious, even when the speed gains are large.
- speed hack attempts on CPython are still of interest, especially educational ones
- speeding up overall is a very hard problem, but fixing specific bottlenecks is good
- stable ABI will help
- PyPy far more sensitive to refcounting bugs than CPython
- static analysis to pick up refcounting bugs could help a great deal
- "Here there be dragons": Unladen Swallow shows that overall speedups are not easy to come by

Regex engine upgrade
- new regex library proposed
- added many new features, including the Unicode categories needed to select out Python 3.x identifiers
- potentially big hassle for other implementations since re module includes a lot of C
- IronPython currently translates to .NET compatible regexes, but could rewrite more custom code

GUI Library
- Guido: GUI libraries are nearly as complicated as the rest of Python put together and just aren't a good fit with the release cycle of the standard lib
- Don't want to add another one, but don't want to remove Tcl/Tk support either

twisted.reactor/deferred style APIs in the standard library
- asyncore/aynchat still has users
- would like to have an alternative in the stdlib that offers a better migration path to Twisted
- deferred could be added, such that asyncore based apps can benefit from it
- reactor model separates transport/protocol concerns far more cleanly than asyncore
- protocol level API and transport level API for asyncore may be a better option
- would allow asyncore based applications to more easily migrate to other async loops
- defining in a PEP would allow this to be the "WSGI" for async frameworks ("asyncref", anyone?) (Jesse suggested concurrent.eventloop instead)
- still need someone to step up to write the PEP and integrate the feedback from the Twisted team and the other async frameworks
- plenty of async programming folks able to help and provide feedback (including glyph)
- having this standardised would help make event loop based programming more pluggable
- Guido still doesn't like the "deferred" name
- Glyph considers deferred to be less important than standardising the basic event loop interface

Python VM Summit - Somewhat Coherent Thoughts

Yay, sleep :)

Last night I just dumped my relatively raw notes into a post. This review is more about distilling what was discussed over the day into a few key themes.

Speed Good

One major point was to do with "How do we make Python fast?". Dave Mandelin (Mozilla Javascript dev) was asking how open CPython was to people tinkering with JIT and other technologies to try and speed up execution, and it was acknowledged that python-dev's reaction to such proposals is rarely more than lukewarm. A large part of that resistance comes from the fact that CPython is generally portable to many more architectures than the real speed hacks (which are generally x86 + x86-64 + ARM at best, and sometimes not even all 3 of those). Unladen Swallow also lost a lot of steam, as so much of their effort was going into tasks not directly related to "make CPython faster" (e.g. fixing LLVM upstream bugs, getting benchmarks working on new versions).

Instead, we tend to push people more towards PyPy if they're really interested in that kind of thing. Armin decided years ago (when switching his efforts from psyco to PyPy) that "we can't get there from here", and it's hard to argue with him, especially given the recent results from the benchmarks executed by speed.pypy.org.

There was definitely interest in expanding the speed.pypy.org effort to cover more versions of more interpeters. We don't actually have any solid data in CPython regarding the performance differences between 2.x and 3.x (aside from an expectation that 3.x is slower for many workloads due to the loss of optimised 32 bit integers, additional encoding/decoding overhead when working with ASCII text, the new IO stack, etc). We aren't even sure of the performance changes within the 2.x series.

That last is the most amenable to resolution in the near term - the benchmarks run by speed.pypy.org are all 2.x applications, so the creation of a speed.python.org for the 2.x series could use the benchmarks as is. Covering 3.x as well would probably be possible with a subset of the benchmarks, but others would require a major porting effort (especially the ones that depend on twisted).

Champions and specific points of contact for this idea aren't particularly obvious at this stage. Jesse is definitely a fan of the idea, but has plenty on his plate already, so it isn't clear how that will work out from a time point of view. There'll likely need to be some self-organisation from folks that are both interested in the project and aren't already devoting their Python-relates energies to something else.

The Python Software Foundation, not the CPython Software Foundation

The second major key point was the PSF (as represented by Jesse Noller from the board, and several other PSF members, including me, from multiple VMs) wanting to do more to support and promote implementations other than CPython. We are definitely at the point where all 4 big implementations are an excellent choice depending on the target environment:

  • CPython: the venerable granddaddy, compatible with the most C extensions and target environments, most amenable to "stripping" (i.e. cutting it down to a minimal core), likely the easiest sell in a corporate environment (due to age and historically closest ties to the PSF)
  • Jython: the obvious choice when using Python as a glue language for Java components, or as a scripting language embedded in a Java environment
  • IronPython: ditto for .NET components and applications
  • PyPy: now at the point where deployments on standard server and desktop environments should seriously consider it as an alternative to CPython. It's not really appropriate for embedded environments, but when sufficient resources are available to let it shine, it will run most workloads significantly faster than CPython. It even has some support for C extensions, although big ticket items like full NumPy support are still a work in progress. However, if you're talking something like a Django-based web app, then "CPython or PyPy" is now becoming a question that should be asked.

It didn't actually come up yesterday, but Stackless probably deserves a prominent mention as well, given the benefits that folks such as CCP are able to glean from the microthreading architecture.

Currently, however, python.org is still very much the CPython website. It will require a lot of work it to get to a place where the other implementations are given appropriate recognition. It also isn't clear whether or not the existing pydotorg membership will go along with a plan to modernise the website design to something that employs more modern web technologies, and better provides information on the various Python implementation and the PSF. While the current site is better than what preceded it, a lot of pydotorg members are still gun shy due to the issues in managing that last transition (even the recent migration of the development process docs over to a developer-mainted system on docs.python.org encountered some resistance). However, when the broader Python community includes some of the best web developers on the planet, we can and should do better. (A personal suggestion that I didn't think of until this morning: perhaps a way forward on this would be to first build a new site as "beta.python.org", without making a firm commitment to switch until after the results are available for all to see. It's a pretty common way for organisations to experiment with major site revamps, after all, and would also give the pydotorg folks a chance to see what they think of the back-end architecture)

Standardising the Standard Library

Finally, with the hg transition now essentially done, efforts to better consolidate development effort on the standard library (especially the pure Python sections) and the associated documentation will start to gather steam again. As a preliminary step, commit rights (now more accurately called "push rights") to the main CPython repository are again being offered to maintainers from the other major interpreter implementations so they can push fixes upstream, rather than needing to maintain them as deltas in their own repositories and/or submit patches via the CPython tracker.

Python VM Summit - Rough Notes

In parallel with the the 2 days of tutorials at Pycon, there are a couple of day long meetings for invited folks active in the evolution of the language itself. Today was the VM summit, which focuses on the major Python interpreter implementations (CPython, PyPy, Jython, IronPython), the current status of each, and where things are likely to head in the near- and long-term. (The Thursday session focuses more on the evolution of the language itself, as well as of the wider ecosystem).

CPython and PyPy both had multiple devs at the summit, IronPython and Jython devs were also there (although IronPython got to share their's with CPython). We also has some Parrot VM folks there, as well as one of the Mozilla Javascript devs - a bunch of issues with VM development for dynamic languages apply across languages, despite differences in the surface syntax.

The notes below are probably too cryptic to make sense out of context, bit will hopefully give the gist of what was discussed. These notes are my interpretation of what was said, and may or may not reflect what people actually meant. Names omitted to protect the guilty (and because I didn't write them down)

Commit rights for other VM core devs
  - good idea
  - did some of this last Pycon US
  - will look into adding more this week

Splitting out the standard library and test suite (again)
  - duplication of effort between CPython/IronPython/Jython/PyPy
  - shared commit rights intended to make it easier near term to use CPython as master, allowing bugs to be fixed "upstream"
  - hg transition should make sharing easier
  - main CPython release will stay "batteries included"
  - open to the idea of providng "CPython minimal" and "standard library" downloads (but much work to be done in defining a minimum set)
  - longer term, may want to separate pure-Python stdlib development from "C skills required" hacking on the CPython interpreter core and C accelerated implementation modules for the stdlib

Speed benchmarking
  - speed.pypy.org (very cool!)
  - benchmarks originally chosen by Unladen Swallow team
  - PSF may talk to OSU OSL about setting up speed.python.org
  - benchmark multiple versions of CPython, as well as Jython and IronPython
  - currently benchmarks are 2.x specific, may be a while before 3.x can be compared fully
  - may be GSoC projects in:
      - improving backend infrastructure to handle more interpreters
      - porting benchmarks to Python 3
  - can highlight key performance differences between the implementations (e.g slowspitfire vs spitfire-cstringio)

Python.org download pages
  - should start recommending alternative interpreters more prominently
  - PyPy likely to be faster for pure Python on major platforms
  - IronPython/Jython/CPython still best at integration with their respective environments (Java libraries, .NET linraries, C extensions)

Cool hacks
  - Maciel: pypy JIT viewer
  - Dave Malcolm: CPython HEAP viewer in GDB 7
 
Parrot VM (and JIT for dynamic languages)
  - target VM for dynamic languages (primarily Perl 6 and Tcl at the moment)
  - loadable operations, loadable object types 
  - dynamic ops were original speed target, now moving towards dynamic types instead
  - exploring reducing number of core ops to make JIT more practical
  - looking into taking advantage of LLVM
  - Unladen Swallow blazed this trail, so LLVM has better dynamic language support
  - PyPy has tried and failed to use LLVM as an effective backend
  - some issues may have been fixed due to Unladen Swallow's efforts, but others still exist (e.g. problems with tail recursion)
  - SpiderMonkey similarly struggles with JIT and dynamic patching issues
  - GNU Lightning and LiveJIT projects noted, but nobody really familiar with them 
  - any future Python-on-Parrot efforts likely to focus on using PyPy frontend with Parrot as a backend
  - proof-of-concept written (for a thesis?) that used .NET as a backend target for PyPy
  - original Python-on-Parrot ran into problems due to semantic mismatches between Perl 6 and Python - reached the limits of the degree of difference the Perl 6 toolchain was willing to tolerate)
 
Role of the PSF
  - supports Python the Language, not just CPython the Reference Interpreter
  - could use additional feedback on how to better fulfill that role
  - getting the "boring stuff" done?
  - project-based grants, not blanket personal funding
  - project proposals requiring more funds than the PSF can provide are still valuable, as PSF can help facilitate co-sponsorships (however, still a novel concept - only been done once so far).

2.7 to 3.2
  - PyPy just reaching feature parity with 2.7
  - PyPy now becoming far more interesting for production usage
  - treat PyPy Python 3 dialect like a major Python library (e.g. sponsored by PSF)

CPython warnings for reliance on implementation details
  - ResourceWarning was a nice addition (detects reliance on refcounting for resource cleanup
  - non-string keys in class namespaces would be another good candidate for a warning
  - clarifying finalisation-at-shutdown semantics would be nice (but fixing those semantics in CPython first would help with that)

What is a Python script?

This is an adaptation of a lightning talk I gave at PyconAU 2010, after realising a lot of the people there had no idea about the way CPython's concept of what could be executed had expanded over the years since version 2.4 was released. As of Python 2.7, there are actually 4 things that the reference interpreter will accept as a main module.

Ordinary scripts: the classic main module identified by filesystem path, available for as long as Python has been around. Can be executed without naming the interpreter through the use of file associations (Windows) or shebang lines (pretty much everywhere else).

Module name: By using the -m switch, a user can tell the interpreter to locate the main module based on its position in the module hierarchy rather than by its location on the filesystem. This has been supported for top level modules since Python 2.4, and for all modules since Python 2.5 (via PEP 338). Correctly handles explicit relative imports since Python 2.6 (via PEP 366 and the __package__ attribute). The classic example of this usage is the practice of invoking "python -m timeit 'snippet'" when discussing the relative performance of various Python expressions and statements.

Valid sys.path entry: If a valid sys.path entry (e.g. the name of a directory or a zipfile) is passed as the script argument, CPython will automatically insert that location at the beginning of sys.path, then use the module name execution mechanism to look for a __main__ module with the updated sys.path. Supported since Python 2.6, this system allows quick and easy bundling of a script with its dependencies for internal distribution within a company or organisation (external distribution should still use proper packaging and installer development practices). When using zipfiles, you can even add a shebang line to the zip header or use a file association for a custom extension like .pyz and the interpreter will still process the file correctly.

Package name: If a package name is passed as the value for the -m switch, the Python interpreter will reinterpret the command as referring to a __main__ submodule within that package. This version of the feature was added in Python 2.7, after some users objected to the removal in Python 2.6 of the original (broken) code that incorrectly allowed a package's __init__.py to be executed as the main module. Starting in Python 3.2, CPython's own test suite supports this feature, allowing it to be executed as "python -m test".

The above functionality is exposed via the runpy module, as runpy.run_module() and runpy.run_path().

If anyone ever sees me (metaphorically) jumping up and down about making sure things get mentioned in the What's New document for a new Python version, this is why. Python 2.6 was released in October 2008, but we didn't get the note about the zipfile and directory execution trick into the What's New until February 2010. It is described in the documentation, but really, who reads the command line documentation, or is likely to be casually browsing the runpy docs? This post turning up on Planet Python will probably do more to get the word out about the functionality than anything we've done before now :)