InternalsPython basics: Bytecode

Python basics: Bytecode

I want to delve into some of the internals of Python, and in order to demonstrate what’s going on I’m going to be taking apart some bytecode. To avoid repeating myself, I’ll get some of the basics out of the way here.

Think about the process of executing a piece of Python code. The code starts off as human-readable text that you’ve written in your editor or IDE. What ultimately executes on your computer’s CPU is machine code, which is data in a highly specific form that triggers the correct operations in the CPU. You might use many languages, but they all have to drive the same CPU so ultimately they need to produce the same machine code “language”.

As an aside, machine code is what’s usually called “low-level”, which is really just a way of saying “doesn’t have any nice features to make it easy for humans”. If you want to support a new feature in machine code, you need to add it to the silicon of your CPU, and not just your CPU but every CPU that will ever execute the code. Obviously this is costly, so the only features that get added are ones that improve performance, not conveniences for humans. Instead of making machine code easier, we write in other languages (“higher-level”) that are convenient and transform these into machine code.

Anyway, the process of converting nice Python code (full of comments and readable variable names) into machine code is obviously complex. Complex code is hard to write and maintain; usually it’s better to split a complex task into several simpler tasks, to make things easier. In executing Python (and a lot of other languages) the steps look something like this:

Briefly, the first step is to convert the source code into an Abstract Syntax Tree, or AST. I’ll talk more about that elsewhere, but for now just think of it as cleaning up the code and putting it in a more regular form. Comments are removed and details like blank lines, how many spaces you use for indentation, spaces after commas, unnecessary parentheses etc. are all discarded. It’s the same code, but only telling the Python program what it needs to know, nothing more.

The step we’re interested in here is the compilation: the AST is converted into something a lot easier to execute, which is called bytecode. I think of it like this: the AST says what should be achieved (“iterate over this list”, say) and the bytecode says in simple steps how to do it (increment this variable, copy its value to here, compare these two values etc.)

The final step is where the code is actually executed: it takes input from the user, the network, files etc. and produces some useful output. Note that the first two steps only need to be done once each time you change the code, while the final execution step will happen every time you execute your code. Python saves the output of the first two steps in a .pyc file, and it skips these steps and just reuses the output when you run your code a second time.

Bytecode, machine code: what’s the difference?

I’ve been careful to use these two different terms, but I haven’t explained what the difference is. Is Python bytecode just a different term for machine code?

In theory, it could be. Python could generate machine code from the AST and run it directly on the CPU. But remember that I said above how machine code is built into the CPU and can’t be changed. Different computers (nowadays including your phone, watch, toaster or whatever) have different CPUs with different capabilities, but they might all want to use your Python code.

So what actually happens is that Python generates a Python-specific form of output that’s a bit like machine code, but not specific to the CPU. Then there’s a relatively simple piece of code for each CPU that can process this bytecode at the point of execution. This way there’s only one small piece of code that has to be different for each CPU, and everything else can work the same.

As usual, I’m skipping over some details here. But this should be enough to move on to my main point.

What Python bytecode looks like

Luckily, Python makes it really easy to the bytecode that’s been generated for your code, without using any special tools. Python code can examine its own bytecode, using the built-in dis tool.

Take a function like this:

def add_up(a, b):
    return a + b

Maybe in the past you’ve tried digging into the internals of a function object, and seen that it has some hidden stuff in the __code__ member:

>>> dir(add_up.__code__)
['__class__',
# ... other stuff ...
 'co_argcount',
 'co_cellvars',
 'co_code',
 'co_consts',
 'co_filename',
 'co_firstlineno',
 'co_flags',
 'co_freevars',
 'co_kwonlyargcount',
 'co_lnotab',
 'co_name',
 'co_names',
 'co_nlocals',
 'co_stacksize',
 'co_varnames']

That co_code member looks like it might contain the code, right?

>>> add_up.__code__.co_code
b'|\x00|\x01\x17\x00S\x00'

Looks like it might be the code, but it’s not in a format that we can understand. Not surprising really: as I discussed above, bytecode isn’t meant to be read by humans.

This is where dis comes in:

>>> import dis
>>> dis.dis(add_up)
  2           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_ADD
              6 RETURN_VALUE

This is something we can read. It’s a series of instructions:

  • Load the value from a
  • Load the value from b
  • Add the two loaded values
  • Return the value

What exactly do I mean by “load the value”, though? The way to think of it is that Python bytecode doesn’t operate directly on variables, it moves the variable’s value into a working space first. The working space takes the form of a stack, which is a structure that’s a staple of computer science courses but not used all that much by typical programmers, so I’ll step through the details here.

A stack is a structure where there can be any number of elements stored, but only the element on the end (“top”) is (easily) accessible. This means it can be made very efficient to access, add and remove elements from the top, regardless of how many items it’s storing. All the other elements are accessible, but you have to deal with what’s on the top of the stack and work your way down to the lower elements.

When Python loads a variable into working storage, it copies the value to the top of the stack. For example, let’s assume that a=2 and b=3. The first LOAD_FAST instruction above loads a, the stack looks like:

Then the second instruction loads b, and the stack looks like:

Remember, only the upper element is accessible now.

The next instruction is an add. In Python bytecode that means it always takes the first two elements off the stack, adds them together and puts the result on the stack. It will do this regardless of how many other elements are on the stack; it doesn’t even see them. The result of executing this is to change the values on the stack:

Then the RETURN_VALUE instruction can simply pull the top element off the stack and return it to the calling function. Note the neatness that this function started and ended with an empty stack, because the series of instructions that pull values off exactly balanced the instructions that push values on. You can imagine writing a different series of instructions that doesn’t achieve this: either leaving values on the stack at the end of the function, or trying to pull off more values than the stack has. This isn’t allowed in Python, and will cause the interpreter to crash. Luckily there’s no way for this to happen, other than a bug in the Python compiler, unless you’re building your own bytecode (don’t worry, we’ll try that later…)

There’s a lot more to it, but that’s the basic idea. Hopefully this gives you a better idea of what’s going on when you execute Python code.

Categories: Internals Tags:

Comments

No Comments Yet. Be the first?

Post a comment

Your email address will not be published. Required fields are marked *