Posts/2011/04

Musings on the culture of python-dev

I mentioned at the end of my PyCon summary post that several people had told me that they find python-dev to be a hostile and unwelcoming environment, and that (after some reflection on the matter) I could actually see their point. We may be the model of civility compared to somewhere like the Linux Kernel Mailing List or (*shudder*) Blizzard's World of Warcraft forums, but mere civility is a far cry from being a consciously welcoming place.

I'll say up front that python-dev itself is unlikely to change any time soon. There are reasons it is the way it is, and I'll elaborate on some of them later in this post. In the meantime, python-ideas is available as a venue where outlandish ideas won't be rejected quite as abruptly, and the core-mentorship list has been set up specifically to provide a gentler introduction to the ways and means of core development without having to jump right in at the deep end by posting to python-ideas or python-dev. Questions about development with Python will continue to be redirected in fairly short order to python-list or python-tutor.

And, of course, what follows is just my opinion. Ask another veteran python-dev poster what they think, and you'll likely get a different answer. Ask an actual newcomer to (or lurker on) python-dev, and they'll probably have a different answer, too.

Python evolves too slowly!

You're changing the language too fast!

If I had to choose just one explanation for the frequency of abrupt responses in python-dev, the tension between the above two statements would be it. Compared to application-level open source projects, Python actually evolves quite slowly. 18+ months between minor version increments? 10 years between major versions? That's crazy! Canonical releases an entire new OS version every 6 months!

On the flip side, however, for a programming language definition, Python evolves quite fast. There's no such thing as a minor release for C or C++ - the last versions of those were C99 and C++98 respectively. We should see C++11 published by the end of the year, and C1X is still in development. CPython's own PEP 7 still mandates the use of C89 compatible constructs for portability reasons, even though C89 is older than one of our release managers.

Java hasn't had a major feature update since 2006, and even C# is only running at a new version every 2-3 years (with the last formally standardised version being 2.0 back in 2006).

Python also has a history of being more aggressive with deprecations than are languages backed by large corporate sponsors. We only have limited volunteer resources, so rather than letting old code bitrot (or else take up maintenance time when it breaks), we'd prefer to rip it out in favour of an improved alternative. However, "more aggressive" is still pretty slow - some deprecated features stuck around for nearly 10 years (until Python 3 came out) and even a "fast" deprecation has historically taken at least 3 years (x.y contains feature, x.y+1 deprecates it, x.y+2 removes it). It's highly likely that that deprecation period will be extended out by a release for the 3.x series, pushing the minimum lifetime of a new feature that later proves to be a mistake out to nearly 5 years.

The task of updating the language and the standard library is a balancing act between those two forces - we want to make life easier for programmers adopting Python for new activities, while preserving backwards compatibility for existing applications. This is why Python 3 is such a big deal - most decisions are made with the emphasis on the needs of existing Python programmers, but the decision to create Python 3 was largely for the benefit of future Python programmers. That means that all current Python programmers are lumped with the task of actually managing a disruptive transition that wasn't really designed for their immediate benefit. Obviously, the collective opinion of python-dev is that it will be worth the pain in the long run, but those anticipated benefits don't make the migration any easier to deal with right now.

We get quite a few people coming into python-dev and betraying quite quickly that they don't have any respect for the time frames involved in language (rather than application) development. Telling someone "You're wrong, but explaining the real reasons why would require that I distil decades of experience down into a single email post and I don't feel like taking the time to do that right now, since even if I tried you would ignore me anyway" tends to be difficult to phrase politely.

A lot of the rest of this post is really just elaborations on the theme of why a programming language needs to evolve more slowly than most other pieces of software.

Heart of the ecosystem, but far from the whole of it

I've elaborated on the cost of change before, when discussing the design principle "Status Quo Wins a Stalemate". The core point is that any significant change made by python-dev, even one that will ultimately be a positive one, imposes a high near-term cost as the implications of the change ripple out across the whole Python ecosystem. As noted above, newcomers can easily perceive this as high-and-mighty arrogance rather than the voice of experience.

What do you mean by "cognitive burden"?

Even without considering the near-term cost of changes, every addition to the language (and even the standard library) imposes a potential burden on anyone learning the language in the future. You can't just say, "Oh, I won't worry about learning that feature" if the code base you've been asked to maintain uses it, or if it is offered as an answer to a query posted on python-list or Stack Overflow or the like. The principle of "There Should Be One - and preferably only one - Obvious Way To Do It" is aimed squarely at reducing the cognitive load on people trying to learn the language. Quite clearly, the "only one" aspect is an ideal rather than a practical reality (two kinds of string formatting and three argument parsing libraries in the standard library all say "Hi!"), but in such cases we do try to indicate that the most recently added (and hopefully least quirky) approach is the preferred way to do it.

This idea is also encountered as the aphorism "Not every three line function needs to be a builtin". Again, new posters may not take kindly to being told that their idea simply doesn't cut it as a potential language addition.

Gee, how dumb are you lot? Why don't you just...?

Another favourite bugbear is posters that bounce into python-dev assuming that we're a collection of clueless idiots that can't see the obvious. Collectively, we do pay quite a bit of attention to what other language communities are doing, as well as having personal experience with what does and doesn't work in actual programming practice. There's a reason that "new" Python features are generally modelled on something that has been demonstrated to work elsewhere (e.g. list comprehensions et al inspired by Haskell, the with statement partially inspired by C++ RAII, the new string formatting syntax inspired by C# string formatting).

New posters that give the list a tiny bit of credit and do us the courtesy of at least asking "Has this been thought of or discussed before? Are there any problems with it that I haven't considered?" tend to get significantly more positive reactions than those that start with a tone closer to "Here is my awesome idea, and you are seriously dumb if you don't get it and decide to adopt it immediately!". Positive responses are even more likely if ideas are posted to the right list (i.e. python-ideas).

You do remember you didn't have to pay a cent for this, right?

A fortunately rare (but still annoying when it arises) source of negative reactions is the end user that comes in demanding to know why certain things aren't being done, when the answer is "Because nobody stepped up to either do it themselves, or to pay for someone else to do it". It's a pretty simple equation, really, and not demonstrating understanding of it suggests a complete disregard for the volunteer nature of so many of the contributions that have been made to Python over the years.

In the end, we're still just people

We like painting bikesheds (or, more to the point, we can't always help ourselves, even when we know better). We like to be right and "win" arguments (or sometimes simply take time to process and properly understand the point someone else is trying to make). Even the mailing list members that are paid to work with Python by our employers are typically still participating in python-dev and hacking on CPython in our spare time rather than as a job, so there's not a lot of tolerance for "noise" and "time wasting".

As I'm a firm believer in the phrase "Vigorous criticism is the only known antidote to error", there are limits to how much I would personally want the culture of python-dev to change. Moving "blue sky" dreaming to python-ideas, "how does the process work?" coaching to core-mentorship and VCS management issues to python-committers allow them to develop cultures more appropriate to those specific activities, allowing python-dev to really focus in on the "vigorous criticism" part of the story. Keeping that from crossing the line into excessive negativity and a total reluctance to change is an ongoing challenge, but hopefully an awareness of that danger and the occasional pause for reflection will be enough to keep things on the right track.

The benefits (and limitations) of PYC-only Python distribution

This Stack Overflow question hit my feed reader recently, prompting the usual discussion about the effectiveness of PYC only distribution as a mechanism for obfuscating Python code.

PYC Only Distribution

In case it isn't completely obvious from the name, PYC only distribution is a matter of taking your code base, running "compileall" (or an equivalent utility) over it to generate the .pyc files, and then removing all of the original .py source files from the distributed version.

Plenty of Python programmers (especially the pure open source ones) consider this practice an absolute travesty and would be quite happy to see it disallowed entirely. Early drafts of PEP 3147 (PYC Repository Directories) in fact proposed exactly that - in the absence of the associated source file, a compiled PYC file would have been ignored.

However, such blatant backwards incompatibility aroused protests from several parties (including me), and support for PYC-only distribution was restored in later versions of the PEP (although "compileall" now requires a command line switch in order to generate the files in the correct location for PYC-only distribution).

Use Cases

As I see it, there are a couple of legitimate use cases for PYC-only distribution:
  • Embedded firmware: If your code is going onto an embedded system where space is at a premium, there's no point including both your source code and the PYC files. Better to just include the compiled ones, as that is all you really need
  • Cutting down on support calls (or at least making the ones you do get more comprehensible): Engineers and scientists like to tinker. It's in their nature. When they know just enough Python to be a danger to themselves and others, you can get some truly bizarre tickets if they've been fiddling with things and failed to revert their changes correctly (or didn't revert them at all). Shipping only the PYC files can help make sure the temptation to fiddle never even arises

Of the two, the former is by far the stronger use case. The latter is attempting a technical solution to a social problem and those rarely work out well in the long run. Still, however arguable its merits, I personally consider deterrence of casual modifications a valid use case for the feature.

Drawbacks

Stripping the source code out of the distribution does involve some pretty serious drawbacks. The main one is the fact that you no longer have the ability to fall back to re-compilation if the embedded magic cookie doesn't match the execution environment.

This restricts practical PYC-only distribution to comparatively constrained environments that can ensure a matching version of Python is available to execute the PYC files, such as:
  • Embedded systems
  • Corporate SOEs (Standard Operating Environments)
  • Bundled interpreters targeting a specific platform

Cross-platform compatibility of PYC files (especially for 32-bit vs 64-bit and ARM vs x86) is also significantly less robust than the cross-platform compatibility of Python source code.

Limitations

Going back to the SO question that most recently got me thinking about this topic, the big limitation to keep in mind is this: shipping only PYC files will not reliably keep anyone from reading your code. While comments do get thrown away by the compilation process, and docstrings can be stripped with the "-OO" option, Python will always know the names of all the variables at runtime, so that information will always be present in the compiled bytecode. Given both the code structure and the original variable names, most decent programmers are going to be able to understand what the code was doing, even if they don't have access to the comments and docstrings.

While there aren't any currently active open source projects that provide full decompilation of CPython bytecode, such projects have existed in the past and could easily exist again in the future. There are also companies which provide Python decompilation as a paid service (decompyle and depython are the two that I am personally aware of).

Alternatives

You can deter casual tinkering reasonably well by placing your code in a zip archive with a non-standard extension (even .py!). If you prepend an appropriate shebang line, you can even mark it as executable on POSIX based systems (see this post for more information).

You could also write your code in Cython or RPython instead of vanilla Python and ship fully compiled executable binaries.

There are minifier projects for Python (such as mnfy) that could be fairly readily adapted to perform obfuscation tricks (such as replacing meaningful variable names with uninformative terms like "_id1").