This post will dig into what happens when a new class is created, at a bytecode level. This isn’t too interesting on its own, but one of my other posts that’s in the works has ended up being far too long, and so I’m breaking this out as a chunk that I can refer to.
As you may already know, class definitions in Python are executed like any other code. This means that you can execute arbitrary code in your definition if you so wish:
class Dog:
print("In the process of creating class")
def bark(self):
print("woof")
This prints the message "In the process of creating class"
once and only once, when the module that defines the class is first imported.
You can also conditionally define a class:
import random
if random.choice([True, False]):
class Dog:
def bark(self):
print("woof")
If you do this, then 50% of the time it will give you a working class, and 50% of the time you’ll get a NameError
if you try to use the class. This is stupid, but maybe you can usefully use this kind of thing for OS-dependent classes that shouldn’t be available if the underlying OS doesn’t support them.
So what’s actually going on in the bytecode? Let’s disassemble it and find out. I’m going to assume that you already know the basics of how Python bytecode works. If not, you can see my article on it here.
First of all, we compile some code:
source = """
class Dog:
def bark(self):
print("woof")
"""
code = compile(source, '<string>', 'exec')
We can then disassemble it to find out what’s going on. If you’re using Python 3.6, you’ll get something like this:
will print to STDOUT:
2 0 LOAD_BUILD_CLASS
2 LOAD_CONST 0 (<code object Dog at 0x7f42d20b6c00, file "<string>", line 2>)
4 LOAD_CONST 1 ('Dog')
6 MAKE_FUNCTION 0
8 LOAD_CONST 1 ('Dog')
10 CALL_FUNCTION 2
12 STORE_NAME 0 (Dog)
14 LOAD_CONST 2 (None)
16 RETURN_VALUE
The basic form of this is reasonably familiar, but what’s the MAKE_FUNCTION
talking about? This is code that makes a class, and yet the bytecode is making a function. And where’s the function bark
? This actually is a function, but it’s nowhere in sight when MAKE_FUNCTION
is being invoked.
Let’s break it down. We can see this as:
2 0 LOAD_BUILD_CLASS
... some other stuff ...
10 CALL_FUNCTION 2
12 STORE_NAME 0 (Dog)
14 LOAD_CONST 2 (None)
16 RETURN_VALUE
What this does is load a special built-in function called __build_class__
, set up some arguments (which we skip over for now), call the function with two arguments on the stack, assign the result to the name Dog
and then return None
. So the interesting things to consider are:
- What goes on inside
__build_class__
?
- What are the arguments that we pass to it?
What does __build_class__ do?
__build_class__
isn’t documented anywhere, but it’s easy enough to find in the cpython source code. I won’t step through it line by line, but you can find it at Python/bltinmodule.c
if you want to dig in to the details.
The __build_class__
function takes at least two arguments (a function plus a name string), with optional arguments after that for base classes. Let’s ignore base classes for now.
The interesting part of the __build_class__
function is this:
cell = PyEval_EvalCodeEx(PyFunction_GET_CODE(func), PyFunction_GET_GLOBALS(func), ns,
NULL, 0, NULL, 0, NULL, 0, NULL,
PyFunction_GET_CLOSURE(func));
Here func
is the function that was passed in to __build_class__
, which is the mystery function we haven’t explained yet. The only other variable is ns
, which is an empty Python dict.
This call evaluates some code. Specifically, it executes the code of the function func
, in the context of the globals that func
has access to, and using ns
as the local namespace. The return value gets mostly ignored. If this function does anything at all, the interesting thing is in its side-effects on the dict ns
.
Hint: side-effects on ns
are very important to this.
After we’ve evaluated this mystery function, the ns
dict is passed to the class’s metaclass. Metaclasses in Python get a bit confusing, so for now let’s ignore this detail and assume we’re using the default metaclass, which is type()
. Therefore what we’re doing is calling:
type("Dog", base_classes, ns)
You can think of this as a class instantiation: The class is type
, and the instance we end up with is the Dog
class. Dog
is both a class and an instance: it’s a class for future instances like rex
, rover
and lassie
, but is itself an instance of type
.
What are the arguments we pass in to __build_class__?
We’ve figured out that __build_class__
takes a mystery function, evaluates it for side effects, then creates an instance of type
using the resultant namespace. But what is the mystery function?
Let’s look again at that disassembly:
2 0 LOAD_BUILD_CLASS
2 LOAD_CONST 0 (<code object Dog at 0x7f42d20b6c00, file "<string>", line 2>)
4 LOAD_CONST 1 ('Dog')
6 MAKE_FUNCTION 0
8 LOAD_CONST 1 ('Dog')
10 CALL_FUNCTION 2
12 STORE_NAME 0 (Dog)
14 LOAD_CONST 2 (None)
16 RETURN_VALUE
Specifically, we’ll look at the bit we skipped over before:
2 LOAD_CONST 0 (<code object Dog at 0x7f42d20b6c00, file "<string>", line 2>)
4 LOAD_CONST 1 ('Dog')
6 MAKE_FUNCTION 0
When MAKE_FUNCTION
is called with an argument of 0
, it’s the simplest case: it takes only two arguments. The two arguments are a code object and a name. So if we want to know about the function we’re creating (and ultimately calling inside of __build_class__
) we need to look inside this code object ot see what it’s doing.
The code object is loaded with LOAD_CONST 0
, which means that we can find it in the tuple of constants associated with this code block:
dis.dis(code.co_consts[0])
gives:
2 0 LOAD_NAME 0 (__name__)
2 STORE_NAME 1 (__module__)
4 LOAD_CONST 0 ('Dog')
6 STORE_NAME 2 (__qualname__)
3 8 LOAD_CONST 1 (<code object bark at 0x7fb6c7b77930, file "<string>", line 3>)
10 LOAD_CONST 2 ('Dog.bark')
12 MAKE_FUNCTION 0
14 STORE_NAME 3 (bark)
16 LOAD_CONST 3 (None)
18 RETURN_VALUE
Now we’re getting somewhere. Suddenly this looks a bit more like the inside of a class definition. We’re loading up our method object and giving it the name "bark"
. The actual code for the method isn’t visible here, it’s stored in a constant nested inside the code object:
dis.dis(code.co_consts[0].co_consts[1])
gives:
4 0 LOAD_GLOBAL 0 (print)
2 LOAD_CONST 1 ('woof')
4 CALL_FUNCTION 1
6 POP_TOP
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
You should recognise this bit as the innards of this method:
def bark(self):
print("woof")
So what are we actually saying here?
It’s got a bit confusing. Functions within functions within functions.
I think the key is to think of this class definition:
class Dog:
def bark(self):
print("woof")
as actually being a function dressed up:
def _Dog_namespace_creator():
def bark(self):
print("woof")
Creating the class works something like this:
- Compile a function that creates the
Dog
namespace (I’m calling this _Dog_namespace_creator
for clarity, though it isn’t really called that).
- Execute this function, and keep hold of the resultant namespace. Remember that a namespace is just a dictionary. In our case, the namespace after executing this function contains one member, a function called
bark
.
- Create an instance of
type
(or some other metaclass) using this namespace. The class will therefore have a method called bark
in its namespace.
This is all done once, when the class is defined. None of this stuff needs to happen again when the class is instantiated.
Hmm. What’s the really short version?
The body of a class
definition is a lot more like a function than you think. It actually is a function, one that is executed when the class is defined and builds the members of the class.