How import works differently in Python 3.5

I messed up at work recently. We (a small start-up) have a staging server and a production server. Nothing gets deployed into production until it’s run on staging for a while and seems to be solid. But still I managed to mess this up.

The problem started when I upgraded the deployment from Python 3.4 to 3.5. This can be done without downtime (I just create a fresh VM behind the load balancer and then cut over), but it’s still a little risky. Long story short, I ended up cutting over staging but leaving the production server for a little while to wait for an appropriate maintenance window. Technically staging and production are different, but what does it matter? We’re not using any advanced Python features, or so I thought…

As it happens, while we were in this situation I hit a Pylint warning because one of my modules went over the length limit. It’s a bit of a silly warning, but I decided to attempt to break the file down rather than just ignore it. But of course the two resultant modules (models and query_sets in my case) ended up depending on each other. I ended up doing:

from . import models

in one module and:

from . import models

in the other one. This means that the two modules only see each other’s members in module-qualified form (e.g. models.Foo) rather than in bare form, which is actually an advantage in my opinion, since it starts to emphasise separation between the modules.

Unfortunately, this worked fine in my staging setup (Python 3.5) and then died when it was deployed to production (Python 3.4). I decided to dig into why this is.

Interlude: How circular dependencies (don’t) work in Python

If you already understand the ramifications of circular dependencies in Python you can skip this bit.

Everyone knows there’s a difference between:

from mymodule import Foo

and

import mymodule

What a lot of people don’t realise is that this goes a lot deeper than just how you qualify Foo when you use it in your code.

First thing to understand: Python recurses through imports depth-first. The reason that circular imports don’t cause an infinite loop is that each module is added to sys.modules at the beginning of processing, and if the module is already in sys.modules then the import step is skipped (though namespace modifications are still made to the local module, which will be important later). This means that in a circular import, the recursion is cut off as soon as it hits a module that’s already been visited, even if the module is not yet completely imported (because it’s further up the stack).

Second thing to understand: A python module is (among other things) a namespace, which in Python really means it’s got a dict that maps from names to objects (functions, classes, occasionally bare variables) contained in the module.

Third thing to understand: Definitions in a Python module are executed one at a time, and added to the module’s namespace as they are executed. At the top of the file the namespace is empty. As the definitions are processed it gets updated until by the bottom of the file it contains all that we expect it to.

Now, this sequence doesn’t normally matter because we can usually treat an import statement as a single atomic step from the point of view of the code calling the import. The fact that the module namespace starts out empty and gets filled up step by step doesn’t matter, because we only see the namespace once it’s populated.

Which brings us to the fourth thing to understand: import a and from a import * do completely different things as regards namespaces. The statement:

import mypackage

creates a reference to the module’s namespace (with the name mypackage) in the local module. This is an alias to the real namespace, so as the mypackage namespace gets populated, the changes to that are seen in real time by the code that imported it.

However, if you do:

from mypackage import *

then you copy the objects from the mypackage namespace into the local namespace. This is a one-time operation: once you’ve done the copy once, subsequent changes to the namespace don’t get reflected in the importing module’s namespace. How could they?

Again, you don’t really care about this sequencing until you hit a circular import. Circular imports matter a lot, because if widget imports gadget imports widget, then the second widget import will be skipped, and gadget will get imported with an incomplete copy of widget.

All of this means that if you’re in a situation with circular imports, you should avoid trying to import objects directly into the local namespace. Instead, import whole packages and rely on the fact that by the time your code actually executes, the packages will be populated.

Why this broke my code in production

When I introduced the circular dependency I was aware of all the above issues, and was careful to import whole packages rather than attempting to bring package contents into scope. I thought this would protect me. However, I was still using the import ... from form of the import statement, which turns out to be an issue.

Here’s the C code that implements import ... from in Python 3.4:

static PyObject *
import_from(PyObject *v, PyObject *name)
{
    PyObject *x;
 
    x = PyObject_GetAttr(v, name);
    if (x == NULL && PyErr_ExceptionMatches(PyExc_AttributeError)) {
        PyErr_Format(PyExc_ImportError, "cannot import name %R", name);
    }
    return x;
}

This gets called after the module has been loaded from disk and compiled. In our case, since we’re deep in a recursive call when this happens, Python will try to load the module and find it is already (in the process of being) loaded. It then calls import_from to make the namespace alterations.

If you’re not familiar with Python internals you can ignore most of the details, but notice that this is basically just a call to getattr() with some error checking. The getattr() is going to fail because the module has been loaded, but the parent package’s namespace hasn’t been updated (which doesn’t happen until the module loading is complete).

This failure is unnecessary. The module exists in sys.modules, so it’s perfectly possible for the system to load it from there even if it’s not part of the parent package’s namespace yet. In fact, this is what is done in Python 3.5, following this bug report, which discusses the details. The equivalent code is now:

static PyObject *
import_from(PyObject *v, PyObject *name)
{
    PyObject *x;
    _Py_IDENTIFIER(__name__);
    PyObject *fullmodname, *pkgname;
 
    x = PyObject_GetAttr(v, name);
    if (x != NULL || !PyErr_ExceptionMatches(PyExc_AttributeError))
        return x;
    PyErr_Clear();
    pkgname = _PyObject_GetAttrId(v, &PyId___name__);
    if (pkgname == NULL) {
        goto error;
    }
    fullmodname = PyUnicode_FromFormat("%U.%U", pkgname, name);
    Py_DECREF(pkgname);
    if (fullmodname == NULL) {
        return NULL;
    }
    x = PyDict_GetItem(PyImport_GetModuleDict(), fullmodname);
    Py_DECREF(fullmodname);
    if (x == NULL) {
        goto error;
    }
    Py_INCREF(x);
    return x;
 error:
    PyErr_Format(PyExc_ImportError, "cannot import name %R", name);
    return NULL;
}

Again, you can ignore a lot of the details here, but notice the pattern: First it tries getattr(), which will work in most cases. If that doesn’t work, it builds up the full module name in dotted form (by appending .modulename to the parent package name) and looks it up in sys.modules.

How does coverage measurement work in Python?

If you’re writing serious Python you’re hopefully testing it, and if you’re doing any serious testing you are hopefully measuring how much code coverage you have.

But how does code coverage measurement actually work? If I execute a test function and get back a result, how can I possibly tell which lines of code were executed to get that result? And can this be done in pure Python, or is compiler magic involved?

Continue reading “How does coverage measurement work in Python?”

Assertion rewriting in Pytest part 4: The implementation

Part 1part 2 and part 3 looked at some background to why Pytest does something unusual and the principles behind how it works. Now it’s time to look into the implementation.

First it’s worth fleshing out how the AST stuff fits in to the original motivation of getting better assertions. Our original problem was that we had something like:

assert number_of_the_counting == 3

but we want more diagnostic information to be written out on failure. We could make the original test author write:

if not (number_of_the_counting == 3):
    print("Failed, expected number_of_the_counting == 3")
    print("Actual value=%s" % number_of_the_counting)

but there’s no chance that test authors are going to write this on every assertion they write. We saw how Python code can be converted into a tree structure, and how that tree structure can be converted into code we can execute. Hopefully you can see the possibility to convert the first form into a tree structure, modify the tree structure to represent the second form, then execute the result. Working on the tree structure makes it possible for us to do this without tearing our hair out.

With that background established, let’s look at Pytest, which actually does this stuff for real. The code we’re interested in is in pytest.assertion.rewrite, starting with the class AssertionRewritingHook. This class is an import hook as defined in PEP 302. This means that when this hook is registered, any subsequent call to import triggers code in this class.

There are actually two different objects involved in Python imports, finders and loaders. The finder receives a module name and figures out which file this relates to. The loader actually loads the given file and creates a Python module object. As is common, the AssertionRewritingHook does both jobs, via the find_module and load_module methods.

The find_module method is where the interesting stuff actually happens. Within this method it checks whether the module needs to be rewritten (only test cases get modified, not product code) and generates the rewritten module. The load_module just plucks the result from the cache. find_module has some logic for caching and handling encoding issues etc., but passes control to the AssertionRewriter class to do the rewriting.

The AssertionRewriter follows the visitor pattern, which deals with a tree structure made up of lots of different types of things. In our case, we have an AST with different nodes representing different pieces of code: class definitions, method definitions, variable assignments, function calls etc. The AssertionRewriter goes breadth-first through the tree looking for assertions:

nodes = [mod]
while nodes:
    node = nodes.pop()
    for name, field in ast.iter_fields(node):
        if isinstance(field, list):
            new = []
            for i, child in enumerate(field):
                if isinstance(child, ast.Assert):
                    # Transform assert.
                    new.extend(self.visit(child))
                else:
                    new.append(child)
                    if isinstance(child, ast.AST):
                        nodes.append(child)
            setattr(node, name, new)
        elif (isinstance(field, ast.AST) and
              # Don't recurse into expressions as they can't contain
              # asserts.
              not isinstance(field, ast.expr)):
            nodes.append(field)

The key line is here:

if isinstance(child, ast.Assert):
    # Transform assert.
    new.extend(self.visit(child))

This triggers the actual operation. Everything else is just code for searching through the tree and leaving everything unchanged that isn’t part of an assert statement. The Python AST is quite well structured, but it is still designed primarily for execution and not to make our life easy. Therefore the code is a little more complex than we might like.

The call to self.visit() actually hits a method in the base ast.NodeVisitor class. This forwards the call to a methodvisit_<nodename> if one exists (visit_Assert in our case) or generic_visit otherwise. This is part of the Visitor pattern, and means we can have different code for handling each different case without having to write a tedious if .. elif .. elif dispatch block.

So we end up calling visit_Assert() with the assertion node. If we have an assertion like:

assert myfunc(x) == 42, 'operation failed'

the AST node will look like:

Assert(
    test=Compare(
        left=Call(
            func=Name(id='myfunc', ctx=Load()),
            args=[Name(id='x', ctx=Load())],
            keywords=[]
        ),
        ops=[Eq()],
        comparators=[Num(n=42)]
    ),
    msg=Str(s='operation failed')
)

The test is the thing that we want to annotate, and Pytest deals with it like this:

# Rewrite assert into a bunch of statements.
top_condition, explanation = self.visit(assert_.test)

This again re-dispatches to the correct visit_... method, in our case visit_Call (which is actually visit_Call_35 on my version of Python, due to a change in Python 3.5 and later).

The function that handles the function call looks like this:

new_func, func_expl = self.visit(call.func)
arg_expls = []
new_args = []
new_kwargs = []
for arg in call.args:
    res, expl = self.visit(arg)
    arg_expls.append(expl)
    new_args.append(res)
for keyword in call.keywords:
    res, expl = self.visit(keyword.value)
    new_kwargs.append(ast.keyword(keyword.arg, res))
    if keyword.arg:
        arg_expls.append(keyword.arg + "=" + expl)
    else:  # **args have `arg` keywords with an .arg of None
        arg_expls.append("**" + expl)
 
expl = "%s(%s)" % (func_expl, ', '.join(arg_expls))
new_call = ast.Call(new_func, new_args, new_kwargs)
res = self.assign(new_call)
res_expl = self.explanation_param(self.display(res))
outer_expl = "%s\n{%s = %s\n}" % (res_expl, res_expl, expl)
return res, outer_expl

Let’s just look at a couple of these lines:

new_func, func_expl = self.visit(call.func)
# ...
new_call = ast.Call(new_func, new_args, new_kwargs)

This operates on the function that is being called, which in our case
is a lookup of a local name. In case that’s not clear: in Python, when
we have a call like:

my_function(42)

there are actually two steps: The name my_function is looked up in the local context to find out which function it refers to. Then the function is executed. At the moment we’re looking at the first of these steps.

The Name node (which looks up the function) is transformed by calling visit, which forwards to visit_Name. The result of this is new_func, which is packed into a new ast.Call node. The resultant AST still has the same effect of calling the function, but has more functionality wrapped round it. The AST will eventually be compiled and executed and will work like the original code, but behave differently in the case where the assertion fails.

OK, I’ve talked about “additional behaviour”, but what exactly does that amount to? When we call:

new_func, func_expl = self.visit(call.func)

what is getting returned?

Actually, the first value returned by visit_Name is just its first parameter, so new_func is actually just call.func. But func_expl contains some code that can be used to generate a human-readable name for the function:

def visit_Name(self, name):
    # Display the repr of the name if it's a local variable or
    # _should_repr_global_name() thinks it's acceptable.
    locs = ast_Call(self.builtin("locals"), [], [])
    inlocs = ast.Compare(ast.Str(name.id), [ast.In()], [locs])
    dorepr = self.helper("should_repr_global_name", name)
    test = ast.BoolOp(ast.Or(), [inlocs, dorepr])
    expr = ast.IfExp(test, self.display(name), ast.Str(name.id))
    return name, self.explanation_param(expr)

Let’s build this up a step at a time, assuming we’ve passed in the Name(id='myfunc', ctx=Load()) that was in the AST we examined above:

locs = ast_Call(self.builtin("locals"), [], [])

This is just AST-speak for something like:

locals()

Next:

locs = ast_Call(self.builtin("locals"), [], [])
inlocs = ast.Compare(ast.Str(name.id), [ast.In()], [locs])

We already know what locs is, so we can substitute it into the comparison:

"myfunc" in locals()

The next couple of lines:

dorepr = self.helper("should_repr_global_name", name)
test = ast.BoolOp(ast.Or(), [inlocs, dorepr])

If we ignore the self.helper stuff for now and just assume this is a call to _should_repr_global_name in the rewritemodule, we can see this becomes:

("myfunc" in locals()) or (_should_repr_global_name(Name(id='myfunc',...)))

This then gets used as a condition in an if expression:

expr = ast.IfExp(test, self.display(name), ast.Str(name.id))

Note that this is an if expression, not
an if statement. In other words, we’re looking at something like this:

repr(myfunc) if (("myfunc" in locals()) or (...)) else "myfunc"

(I glossed over the details of self.display, which actually does a bit more than just generate a call to repr, but that will do for now).

The result of all this shenanigans is to produce two things:

  • an AST that can be evaluated at run time to generate the name of the function
  • func_expl, which is a formatting string with placeholders in it like %(py0)s%(py1)s. These keys can be looked up in a dict stored on the Assertion Rewriter object to obtain the AST that generates the correct diagnostic output.

It’s worth stepping back at this point and reiterating why we do things in this roundabout way: generating an AST that generates the value rather than just generating the value. The answer is that a lot of the values aren’t known until later on when the code is executed. We’ve focused on the function name as an example because it’s simple, but (in most cases) the function name could just be dumped out directly by examining the source code. However, we can’t print the values of the arguments to the function or the return value from the function, because we don’t know what those values are yet.

All the above gets carried out for various other pieces of the input AST: comparisons, boolean operators, arithmetic etc. When the dust settles and we finish processing the visit_Assert, we need to generate a little more code:

# Rewrite assert into a bunch of statements.
top_condition, explanation = self.visit(assert_.test)
# Create failure message.
body = self.on_failure
negation = ast.UnaryOp(ast.Not(), top_condition)
self.statements.append(ast.If(negation, body, []))

By now you’ve hopefully got the trick of interpreting this code: We’re adding an element to self.statements that is the AST for code like the following:

if not (condition):
    explanation

where condition is just the original expression that was inside the assert statement, and explanation is the code that generates the diagnostics.

We’re finally there. Once the AssertionRewriter finishes walking through the tree, we have an AST that can be passed to compile. A little more housework is required to create a module object and put it in the Python modules list, but nothing terribly complicated.

The amazing thing about this is that the rest of the Python system doesn’t know that there’s anything unusual about this decorated module. Calling the Python code within works the same as calling any other Python module.

I think this is pretty impressive stuff, given how powerful it can be and how (relatively) little internal knowledge is required. Hopefully in the coming months I’ll dig a little deeper into some of this stuff and tear down some other examples.

Assertion rewriting in Pytest part 3: The AST

Part 1 and part 2 looked at why a test framework might need to handle method calls in an unusual way, and how we might go about implementing this by the simplest possible means.

The approach in part 2 has a couple of limitations, but one is that we’re using eval, which has a very limited interface. If we have something like:

result = eval("len(some_list) > 10")

then eval lets us find out whether the condition succeeds or fails, which is useful. But ideally we want to go further, and figure out how exactly the condition failed. We know that len(some_list) was less than or equal to 10, but exactly what was it? Maybe it would be nice to see the elements of some_list printed out to aid debugging. eval doesn’t give us any of this.

The problem is that eval is actually doing too much at once. There are broadly three steps:

  • Parsing: What is the structure of len(some_list) > 10? This stage converts a simple string of characters into a richer structure whose shape encodes everything we need to know to execute the code, and nothing more.
  • Compiling: Convert this into a form that the computer can execute.
  • Execution: Find the outcome from executing the code with the specific input values in question.

If we can split these apart, then it’s much easier to write code that manipulates the behaviour of the piece we are interested in.

Let’s look at the first stage for now.

I glibly talked about the structure having “everything we need, and nothing more”, but that’s not really a practical starting point. What should such a structure actually look like?

Without digressing too much, here’s an example of the sort of structure that’s used in Python:

 

If you look at this from the top down, you’ll first see a less-than (<) operator that has two arguments. The right hand argument is a simple number, but the left-hand argument is more complex: it’s made up of a function (len) applied to a further argument (some_list). Some points about this structure:

Firstly, it’s not necessarily obvious from the trivial example here, but this is a tree structure. As you look further down it can split apart into different branches, but the branches never join up with each other to form loops. This makes the structure much easier to work on at the cost of limiting what we can express with it; it turns out that this particular trade-off is a good one.

Secondly, the value of a node in the tree is only affected by the nodes below it, not all the other nodes in the tree. For the len node, the function has the same value whether it’s appearing in len(some_list) < 10 or len(some_list) > 6. This corresponds to what we expect, and is a good sign that this structure is going to make our life easier than if we used a different structure.

This sort of structure is called an Abstract Syntax Tree or AST and as I say, most languages use something like this. However, Python is relatively unusual in the degree to which it exposes the AST to you as a programmer. You can just import the ast module and start playing around:

>>> import ast
>>> tree = ast.parse('len(some_list) > 10', mode='eval')
>>> ast.dump(tree)
"Expression(body=Compare(left=Call(func=Name(id='len', ctx=Load()),
args=[Name(id='some_list', ctx=Load())], keywords=[]), ops=[Gt()],
comparators=[Num(n=10)]))"

If I rearrange that dump a little bit (and cut out some details), we can see:

Expression(
    body=Compare(
        left=Call(
            func=Name(id='len', ...),
            args=[
                Name(id='some_list', ...)
            ],
            ....
        ),
        ops=[Gt()],
        comparators=[
            Num(n=10)
        ]
    )
)

It doesn’t take too much imagination to see that the diagram above is a rough approximation to this.

So we’re peering into the first stage of executing the code; we’re actually seeing inside the workings of exec. But that’s not all. This is real working Python code, and we can prove this by carrying on with the compile and execute stages:

>>> bytecode = compile(tree, '<string>', mode='eval')
>>> eval(
        bytecode,
        {},
        {
            'some_list': [1, 2, 3]
        }
    )
False

We just did the compile and execute (second and third) stages ourselves, and got the same output as if we had let Python do it for us inside of exec.

It’s worth reiterating: this isn’t just a curiosity. This is a practical tool that you can use in real Python code, and you can do this with just the tools in the standard library. I’ll look at how this might be used in practice in the next part.