What kind of language is Python?

Programming languages are complex things, and over time they get refined, extended and eventually abandoned. In this post, I'm going to run through some of Python's core characteristics, and compare them with some other languages, and demystify a few head-scratching words on the way. For example, I explain what a .pyc file is, and why does it eventually get turned into C. In fact, if Python gets turned into C, why don't we just write everything in C in the first place? With an understanding of the design choices made by the Python developers, it is possible to see why Python operates in the way that it does, and it is also possible to see evolution in programming concepts. Think of it as programming archaeology.

  • Python is a high-level language, not a low-level language. This means that Python can be written using an abstracted syntax designed for humans to read. Examples of human-readable syntax are “if-then-else” and “while-do”, and abstracted in this case means independent of the hardware. These abstracted structures are easily understood by humans because they are close to natural language (e.g. English), but include a lot of assumptions that must be explicitly stated, before a computer can understand them. The advantage of high-level programs are that they are faster to write, and can be moved between different computers. High-level languages are also slower, because they must be converted into machine-readable code before being executed, and they often sacrifice efficiency for programming simplicity. An example of a simple but inefficient system is memory management, which is also called garbage collection. In Python, there is an automated system for recycling memory used by expired objects, such as a logged out user, so the programmer doesn’t have to manually deallocate memory. The automatic garbage collection system requires memory and resources to run, which means there is less available for the core program. In comparison to a high-level language, a low-level language, such as assembly, is one that closely follows the commands used by the hardware (also known as the instruction set). Programs that are written in low-level languages are faster to run, because there is less overhead from supporting tools, such as garbage collection, but more difficult to write, because these convenient tools aren’t available, so there is more code to write. As well as high and low level languages, there are also mid-level languages like C, that incorporate elements of direct hardware control, such as manual memory access, but also convenient shortcuts for common mathematical tasks.
  • Python is an interpreted language, not a compiled language. This is a commonly-held view that is correct, but only from a certain perspective, and doesn’t adequately explain some of the complexities of running Python. More accurately, CPython, which is the most common Python implementation, is automatically compiled into byte code (.pyc files) and interpreted at runtime. The compilation from .py to .pyc takes place when a python imports a module and a .pyc file contains byte code derived from a parse tree. Byte code is is a type of intermediate code that is faster for computers to process without being tied to a particular processor type. A parse tree, or Abstract Syntax Tree (AST), is a translation of Python into a logically-equivalent structure that expands and simplifies the code, making it easier for the code to understood by a computer. Once the byte code has been written, it can be passed to the Python virtual machine, which interprets the code - i.e. executes the program. The Python virtual machine is discussed further below.
  • Python is strongly typed, not weakly typed. A strongly typed language is one where variable types are important when performing operations that use the variable. For example, an integer ‘1’ cannot be added to a string ‘geonaut’, because there is no way of adding a number and a string, and there is no obvious candidate for conversion. In weakly-typed languages, some automatic conversion can take place, to allow variables of different types to interact. The advantage of weakly-typed languages are that they are more flexible and forgiving, so code can be written faster. The disadvantage is that a variable with the wrong type might cause a strange bug that is hard to find, because it is not clear what type a variable should be, or where the incorrect variable type is being defined.
  • Python is dynamically-typed not statically-typed. Dynamically-typed languages are those which allow the type of a variable to change at runtime whereas statically-typed languages (e.g. Java) do not allow a variable to change once it is declared. For example, a variable might contain a string at one point in the code, and be changed to an integer later on. Statically-typed languages can detect errors earlier in the compilation process, as invalid operations are more easily identified. Dynamically-typed languages can skip steps in the compilation process, and allows late-binding, which is discussed below.
  • Python is reflective. Reflection is reflection is the ability of a computer program to examine, introspect, and modify its own structure and behaviour at runtime. Practically, this means that it is possible to find out the type of an object or variable, and and to look at the available methods and properties. This is a concept that started to disappear when high-level languages, such as Fortran, were first introduced, but has been resurrected in modern high-level languages, such as Python.
  • Python is late-binding not early-binding. Late-binding and duck-typing are different aspects of the same concept; namely that methods and functions are looked up by name at runtime, rather than during compilation. This is a feature of dynamically-typed languages, which can modify the program at runtime, and might not have enough information to determine method type during compilation. Duck-typing refers to the means of introspecting an object by looking at it’s methods and properties - i.e. if it looks like a duck, walks like a duck and quacks like a duck, it’s a duck.
  • Python is a mutli-paradigm language. This means that Python can be used for object-oriented, procedural and functional programming. Procedural programming describes computation as a sequence of commands for the computer to perform. Object Oriented is imperative programming extended with the ability to package data and related functionality as classes. Functional programming is based on lambda calculus, Functional languages treat functions like functions treat built-in types. This means functions can be passed and returned as values to and from other functions, and can even be created at runtime.
  • Python is an imperative language. Imperative programming is a paradigm that dictates programs should contain information on how a program should achieve the desired outcome. This can be contrasted with declarative programming that doesn’t contain information on how a program should work, but focusses on the desired output. Many OOP languages, such as Java and PHP are also imperative. SQL is an example of a declarative language.
  • Python is written in C. This is true if you consider Python to mean CPython, rather one of the other implementations. CPython is written in C, and that means 2 things; firstly, that the functions in Python are all made out of building blocks (aka primitives) written in C, and secondly, that the Python compiler and process virtual machine are written in C. The reason they are written in C is because the first version of CPython was written in 1989, before C++ was widespread, so C was the natural choice at the time. It is interesting to wonder why Python is still written in C, considering C is such an old language. The answer to that is probably that once enough primitives for Python classes and objects are written in C, those can be used for writing the rest of the interpreter, so you wouldn't gain anything by using C++ instead.
  • Python has a first-class object model. This means objects; may be named by variables, may be passed as arguments to procedures, may be returned as the results of procedures and may be included in data structures. The practical outcome of this is that everything is an object, and objects and functions can be created on the fly.