Anonymous functions in Python

Posted: 2004-12-30 08:40 | More posts about python |

Python currently supports anonymous functions using the lambda keyword. This is a rather ugly beast, and I've yet to find anyone who actually likes the syntax, rather than tolerating it because they want to be able to use anonymous functions. It also forces non-mathematicians to learn that mathemetaticians and functional programmers seem to like calling anonymous functions lambdas, for reasons known only to them.

GvR has stated that he wants to get rid of lambda for Python 3.0. His main reasons seem to be that he dislikes the restriction to a single expression, and that he dislikes the current syntax. The question then, is whether a more Pythonic syntax for anonymous functions can be found to replace the current lambda, and whether the restriction to a single expression is really a problem.

I believe Python 2.4's generator expressions provide good guidance on the correct attitude towards 'anonymous functions as expressions'. Generator expressions are related to for loops similarly to the way anonymous functions are related to full function definitions. Firstly, generator expressions are restricted to a single expression in the "body" of the for loop. They also include an implied yield statement. That is, the following two pieces of code are equivalent (neglecting namespace effects):

sum(x * x for x in seq)

def squares(seq):
  for x in seq:
    yield x * x
sum(squares(seq))

Nobody seems to complain about the fact that generator expressions are restricted to a single expression inside the for loop. Instead, they are extremely happy about the fact that they can do their simple for loop inside an expression instead of breaking it out into a separate generator. I believe the same attitude should apply to anonymous functions - if something is simple enough to express with a single expression, it may be simple enough to embed inside another expression (such as a function call). If it cannot be expressed with a single expression, it is almost certainly too complicated to be embedded inside another expression.

The other argument in favour of the conceptual integrity of restricting anonymous functions to a single expression is Python's distinction between suites (which contain statements), statements (which contain expressions, statements or suites) and expressions (which contain only other expressions). Allowing a suite inside an anonymous function would break the concept of "expressions can only contain other expressions". Even if it turned out to be possible, it would do horrible things to Python's grammar, and open the door to some seriously unreadable code. Even when restricted to a single expression, overuse of lambdas can already lead to incomprehensible code (abuse of generator expressions can also lead to unreadable code, so don't take that last comment as an argument against allowing anonymous functions).

Even if you agree with me that restricting anonymous functions to a single expression is legitimate, that still leaves the question of "Why not just define a named function?". This is certainly GvR's standard response when questioned about the removal of lambda in Python 3.0. To my mind, the killer app for a clean anonymous function syntax is lazy evaluation of function arguments - only performing a calculation if the function actually needs that value. Another, rarer, use is the ability to have a generator expression which yields a sequence of functions. My examples will be based on these two use cases.

The standard mechanism for lazy evaluation in Python is to write a function that accepts a zero-argument callable instead of the argument we want lazily evaluated. If the function actually needs the relevant argument, it invokes the callable and uses the returned value. This approach is very clean when the caller has a function on hand that produces the desired result. When they do not, the caller must create a zero-argument function to be passed as the lazy argument. This is generally either a lambda or a named function created specifically for the purpose. Removing lambda eliminates the choice - you must use a named function. That approach, however, gets rather silly if the caller has the actual desired argument value on hand:

accepts_lazy_arg(lambda: val)

def ret_val():
  return val
accepts_lazy_arg(ret_val)

The canonical use cases for lazy evaluation, of course, are short circuiting versions of functions which implement conditional expressions and switch statements.

Moving on to the second use case, consider the toy problem of creating a list of incrementors - functions that add differing amounts to their arguments. With anonymous functions, this can be done with an expression, without them, it requires several statements:

funcs = [(lambda x: x + i) for i in range(10)]

def incrementors():
  for i in range(10):
    def incrementor(x):
      return x + i
    yield incrementor
funcs = list(incrementors)

There are legitimate use cases for anonymous functions. I don't use them very often, but when I do use them, it would be a genuine pain to work around not having them. So I would be very disappointed to see them disappear completely in Python 3.0. However, where I agree with GvR entirely is that the current syntax is as ugly as sin - I sometimes don't use lambda when it might be useful, simply because it is so ugly and un-Pythonic. That means it must be time to move on to a syntax proposal.

The proposed syntax is based on the idea of functions as mappings from tuples of arguments to tuples of results. In mathematical terms, a function maps from a given domain (e.g. the Cartesian product of the real numbers) to a given range (e.g. negative pi inclusive to positive pi exclusive). Anonymous functions cover only those cases where the result tuple can be obtained from the input tuple using Python expressions. If you need something more complex, switch to using a named function (just as generator expressions require you to switch to a named generator if either the desired result or the filtration condition cannot be written as Python expressions)

One of the problems with lambda is that we already have a perfectly good function keyword in def. So the proposed syntax uses def in an expression context (where it is currently illegal). Another problem I personally have with lambda is that it embeds a colon in the middle of an expression, which I find makes it difficult to parse the rest of the expression. So the proposed syntax uses the new keyword to instead. The existing pseudo-keyword as was considered, but it is already overloaded with enough uses, and the word to better fits the above interpretation of the meaning of anonymous functions. One of GvR's criticisms of lambda that I agree with is that it doesn't require parentheses arounds its argument list. So the proposed syntax requires parentheses around the argument list. Parentheses surrounding the entire anonymous function will be required, even as an argument to a single argument function call. This avoids ambiguity problems with returning a tuple from the anonymous function. All of which gives something like the following equivalent pieces of code:

accepts_func((def (a, b, c) to f(a) + g(b) - h(c)))

def f1(a, b, c):
  return f(a) + g(b) - h(c)
accepts_func(f1)

The proposed syntax can be read as "define an anonymous function from arguments a, b and c to the result f of a plus g of b minus h of c". Or, in a shorter form, "def from a, b and c to f a plus g b minus h c". An earlier version of this post actually contained a bug in the named function version - I had incorrectly used the name 'f', accidentally creating a recursive function. Anyway, using the proposed syntax, the examples above become:

accepts_lazy_arg(def () to val)

funcs = [(def (x) to x + i) for i in range(10)]

An idea worth toying with is whether the argument list and the to keyword should be optional when the argument list is empty. This makes calls to functions which take lazy arguments extremely clean - just take whatever the argument would have been using immediate evaluation, prepend "(def " and append ")". However, that approach may not be possible given the constraints of CPython's simple parser.

Finally, no discussion that covers lazy evaluation would be complete without showing how conditional expressions with short-circuiting behaviour would look. The example usages assume a syntax which allows "() to" to be omitted.

def either(condition, true_case, false_case):
  if condition:
    return true_case()
  else:
    return false_case()

print either(A == B, (def "A equals B"), (def "A does not equal B"))
either(thefile, (def thefile.close()), (def 0))

Note that this proposal does not add any new capability to the Python language. Instead, it merely aims to provide a more Pythonic syntax for the existing lambda expressions, with the aim of retaining anonymous functions for the hypothetical Python 3.0.