One of the more puzzling aspects of Python for newcomers to the language is the
stark usability differences between the standard library's
and the popular (and well-recommended) third party module,
it comes to writing HTTP(S) protocol clients. When your problem is
"talk to a HTTP server", the difference in usability isn't immediately obvious,
but it becomes clear as soon as additional requirements like SSL/TLS,
authentication, redirect handling, session management, and JSON request and
response bodies enter the picture.
It's tempting, and entirely understandable, to want to
chalk this difference
in ease of use up to
requests being "Pythonic" (in 2016 terms), while
has now become un-Pythonic (despite being included in the standard library).
While there are certainly a few elements of that (e.g. the
was only added in Python 2.2, while
urllib2 was included in the original
Python 2.0 release and hence couldn't take that into account in its API design),
the vast majority of the usability difference relates to an entirely different
question we often forget to ask about the software we use:
What problem does it solve?
That is, many otherwise surprising discrepancies between
requests are best explained by the fact that they solve different
problems, and the problems most HTTP client developers have today
are closer to those Kenneth Reitz designed
requests to solve in 2010/2011,
than they are to the problems that Jeremy Hylton was aiming to solve more than
a decade earlier.
It's all in the name
To quote the current Python 3
urllib package documentation: "urllib is a
package that collects several modules for working with URLs".
And the docstring from Jeremy's
original commit message
urllib2 to CPython: "An extensible library for opening URLs using a
variety [of] protocols".
Wait, what? We're just trying to write a HTTP client, so why is the documentation talking about working with URLs in general?
While it may seem strange to developers accustomed to the modern HTTPS+JSON powered interactive web, it wasn't always clear that that was how things were going to turn out.
At the turn of the century, the expectation was instead that we'd retain a rich variety of data transfer protocols with different characteristics optimised for different purposes, and that the most useful client to have in the standard library would be one that could be used to talk to multiple different kinds of servers (like HTTP, FTP, NFS, etc), without client developers needing to worry too much about the specific protocol used (as indicated by the URL schema).
In practice, things didn't work out that way (mostly due to restrictive institutional firewalls meaning HTTP servers were the only remote services that could be accessed reliably), so folks in 2016 are now regularly comparing the usability of a dedicated HTTP(S)-only client library with a general purpose URL handling library that needs to be configured to specifically be using HTTP(S) before you gain access to most HTTP(S) features.
When it was written,
urllib2 was a square peg that was designed to fit into
the square hole of "generic URL processing". By contrast, most modern client
developers are looking for a round peg to fit into the round hole that is
HTTPS+JSON processing -
urllib2 will fit if you shave the corners
off first, but
requests comes pre-rounded.
So why not add requests to the standard library?
Answering the not-so-obvious question of "What problem does it solve?" then
leads to a more obvious follow-up question: if the problems that
urllib2 were designed to solve are no longer common, while the problems that
requests solves are common, why not add
requests to the standard library?
If I recall correctly, Guido gave in-principle approval to this idea at a
language summit back in 2013 or so (after the
requests 1.0 release), and it's
a fairly common assumption amongst the core development team that either
requests itself (perhaps as a bundled snapshot of an independently upgradable
component) or a compatible subset of the API with a different implementation
will eventually end up in the standard library.
However, even putting aside the
misgivings of the requests developers
about the idea, there are still some non-trivial system integration problems
to solve in getting
requests to a point where it would be acceptable as a
standard library component.
In particular, one of the things that
requests does to more reliably handle
SSL/TLS certificates in a cross-platform way is to bundle the Mozilla
Certificate Bundle included in the
certifi project. This is a sensible
thing to do by default (due to the difficulties of obtaining reliable access
to system security certificates in a cross-platform way), but it conflicts
with the security policy of the standard library, which specifically aims to
delegate certificate management to the underlying operating system. That policy
aims to address two needs: allowing Python applications access to custom
institutional certificates added to the system certificate store (most notably,
private CA certificates for large organisations), and avoiding adding an
additional certificate store to end user systems that needs to be updated when
the root certificate bundle changes for any other reason.
These kinds of problems are technically solvable, but they're not fun to solve,
and the folks in a position to help solve them already have a great many other
demands on their time.This means we're not likely to see much in the way of
progress in this area as long as most of the CPython and
are pursuing their upstream contributions as a spare time activity, rather than
as something they're specifically employed to do.