Scripting languages and suitable complexity

Alyssa Coghlan

Steven Lott is a Python developer and blogger that I first came across via his prolific contributions to answering questions on Stack Overflow, and then later by reading his blog posts that appeared on Planet Python. His Building Skills in Python book is a resource I've recently started suggesting newcomers to Python take a look at to see if his style works for them.

Some time ago, he posted about what he called the "Curse of Procedural Design": the fact that, beyond a certain point, purely procedural code typically starts drowning in complexity and becomes an unmaintainable mess. Based on that, he then started questioning whether or not he was doing the right thing by teaching the procedural aspects of Python first and leaving the introduction of object oriented concepts until later in the book.

Personally, I think starting with procedural programming is still the right thing to do. However, one of the key points of object-oriented design is that purely procedural programming doesn't scale well. Even in a purely procedural language, large programs almost always define essential data structures, and functions that work on those data structures, effectively writing object oriented code by convention. Object support built into the language makes this easier but it isn't essential (as large C projects like the Linux kernel and CPython itself demonstrate).

Where languages like Java can be an issue as beginner languages is that by requiring that all code be compiled in advance and written in an object oriented style, they set a minimum level of complexity for all programs written in that language. Understanding even the most trivial program in Java requires that you grasp the concepts of a module, a class, an instance, a method and an expression. Using a compiled procedural language instead at least lets you simplify that a bit, as you only need to understand modules, functions and expressions.

But the scripting languages? By means of a read-eval-print loop in an interactive interpreter, they let you do things in the right order, starting with the minimum level of complexity possible: a single expression.

For convenience, you may then introduce the concept of a 'script' early, but with an appropriate editor, scripts may be run directly in the application (giving an experience very similar to a REPL prompt) rather than worrying about command line invocation.

More sophisticated algorithms can then be introduced by discussing conditional execution and repetition (if statements and loops), but still without any need to make a distinction between "definition time" and "execution time".

Then, once the concept of algorithms has been covered, we can start to modularise blocks of execution as functions and introduce the idea that algorithms can be stored for use in multiple places so that "definition time" and "execution time" may be separated.

Then we start to modularise data and the associated operations on that data as classes, and explore the ways that instances allow the same operations to readily be performed on different data sets.

Then we start to modularise collections of classes (and potentially data and standalone functions) as separate modules (and, in the case of Python, this can be a good time to introduce the idea of "compilation time" as separate from both "definition time" and "execution time").

Continuing up the complexity scale, modules may then be bundled into packages, and packages into frameworks and applications (introducing "build time" and "installation time" as two new potentially important phases in program execution).

A key part of the art of software design is learning how to choose an appropriate level of complexity for the problem at hand - when a problem calls for a simple script, throwing an entire custom application at it would be overkill. On the other hand, trying to write complex applications using only scripts and no higher level constructs will typically lead to an unmaintainable mess.

In my opinion, the primary reason that scripting languages are easier to learn for many people is that they permit you to start immediately with code that "does things", allowing the introduction of the "function" and "class" abstractions to be deferred until later.

Starting with C and Java, on the other hand, always requires instructors to say "Oh, don't worry about that boilerplate, you'll learn what it means later" before starting in with the explanation of what can go inside a main() function or method. The "compilation time" vs "execution time" distinction also has to be introduced immediately, rather than being deferred until some later point in the material. There's also the fact that such languages are actually usually at least two languages in one: the top level "compile time" language that you use to define your data structures, functions and modules and the "run time" language that you actually use inside functions and methods to get work done. Scripting languages don't generally have that distinction - the top level language is the same as the language used inside functions (in fact, that's my main criteria for whether or not I consider a language to be a scripting language in the first place).

Comments