Underhanded Python: giving the debugger the wrong line numbers

In my last post I showed how it’s easy for Python code to detect the presence of the debugger and change its behaviour accordingly. But this did nothing to deal with single-stepping through the code in the debugger: the debugger would show execution hitting the debugger-detection code and changing its behaviour. You could probably disguise this code a bit, but it’s hard to avoid giving the impression that something suspicious is going on.

Luckily we have another tool at our disposal: we can lie to the runtime about our line numbers.

How single-stepping works in the Python debugger

Executing Python is a 3-stage process:

The first stage is the only part that is working with the source code, and debugging takes place at the execution phase. Therefore at each stage the line numbers need to be preserved in some way to be passed to the next phase.

You can see this when you parse the code. Code like this:

a = 1 + 1

gets converted to an AST like:

a = 1 + 1

You can inspect this AST and see the line numbers:

>>> import ast
>>> code = "a = 1 + 1"
>>> tree = ast.parse(code)
>>> print(tree.body[0].lineno)
1

When this AST is compiled into bytecode, the line numbers get attached to the bytecode. The details are a bit complicated because Python tries to make the bytecode as compact as possible, but conceptually there’s a mapping from each bytecode instruction to the corresponding line of the source code it originally came from.

Of course, each line the source code will typically end up generating many bytecode instructions, because individual bytecode instructions are so much less powerful than Python code (that’s kind of the point of it). So you end up with a run of bytecode instructions that correspond to the same line number. The disassembler shows you this:

import dis

source = """
def add(a, b):
    print("Adding")
    return a + b
"""

code = compile(source, 'foo.py', 'exec')

dis.dis(code)

produces:

  2           0 LOAD_CONST               0 (<code object add at 0x7f6ceef89810, file "foo.py", line 2>)
              2 LOAD_CONST               1 ('add')
              4 MAKE_FUNCTION            0
              6 STORE_NAME               0 (add)
              8 LOAD_CONST               2 (None)
             10 RETURN_VALUE

Disassembly of <code object add at 0x7f6ceef89810, file "foo.py", line 2>:
  3           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('Adding')
              4 CALL_FUNCTION            1
              6 POP_TOP

  4           8 LOAD_FAST                0 (a)
             10 LOAD_FAST                1 (b)
             12 BINARY_ADD
             14 RETURN_VALUE

where the left-hand numbers are the source code lines.

This brings us back to sys.settrace. When you register a trace function in Python, you can arrange for your trace function to be called on various events. For our purposes, we care about:

  • call: entering a function or code block
  • line: a new line of code is about to be executed, i.e. the next bytecode instruction is associated with a different line number than the previous one.

The practicalities are a bit fiddly, but essentially this gives us what we need for a single-step debugger: a function gets called on each new line, and that function can suspend execution and allow the system to be inspected.

How we can lie to the debugger

The debugger is reliant on the information attached to the function object. That information is readable like any other Python object property:

def add(a, b):
    print("adding")
    return a + b

print(add.__code__.co_firstlineno)
print(add.__code__.co_lnotab)

The co_firstlineno is simply the source line of the start of the function, and the co_lnotab is a table mapping bytecode positions to source code positions. In our case, we get:

1
b'\x00\x01\x08\x01'

The second line works like this:

  • First byte of bytecode (00) corresponds to 1 line after the beginning of the block (01)
  • 8th byte of bytecode (08) corresponds to 1 line after the previous line (01)

Let’s suppose we have two functions: Our malicious function, and a decoy function that we want the user to see when they single-step through it:

import sys

def add(a, b):
    return a + b

def bad_add(a, b):
    if sys.gettrace() is None:
        print("Doing something malicious")

    return a + b

You might hope that you could just overwrite bad_add‘s line number information:

bad_add.__code__.co_linenumber = add.__code__.co_linenumber

But this doesn’t work, because this is a read-only field in Python. However, what you can do is create a new function object that has all the same properties as the other function except for the ones we want to rewrite. You create a new function object with types.CodeType. However, the interface is a bit fiddly and as of Python 3.8 you can do this much more easily by calling the replace method on the old code object, which creates a copy:

add.__code__= bad_add.__code__.replace(
    co_lnotab=add.__code__.co_lnotab,
    co_firstlineno=add.__code__.co_firstlineno
)

This is replacing the entire add code object with a newly created one. It only replaces the code of add, it doesn’t change the other properties of it (such as the __name__). In other words, add will continue to look like the old add but it will walk and quack like bad_add.

But the clever bit is that the new code object is a hybrid: it has bad_add‘s behaviour, but add‘s line number information. If you try to step through it in the debugger, the debugger will show you (line by line) the source code of add even as it executes bad_add.

This was pretty trivial to do, but to be fair it doesn’t actually work all that well. The source code mappings for add aren’t really a good substitute for those for bad_add, because they have different bytecode. In our case we get away with it because add is pretty trivial, but in real code this would probably lead to odd behaviour: single-stepping would cause the debugger to leap around the function in odd ways that didn’t correspond to the code, and it would be pretty obvious that something was wrong.

However, for a motivated attacker it wouldn’t be that hard to create a more accurate fake source code mapping, which would make this much harder to track.

Of course, this code is still pretty obviously malicious if someone looks at the right piece of source code. The malicious piece can be hidden somewhere well away from the code that someone would be likely to inspect (in a different module, even) but once it’s spotted it’s pretty obvious that it’s doing something underhanded. I have a few thoughts about how this could be addressed, which I’ll talk about next time.

Underhanded Python: detecting the debugger

There used to be an annual competition among C programmers called the Underhanded C contest, with the aim of inventing creative ways to write code that appears to do one thing but is actually doing something very different (and theoretically malicious, though it’s a “white hat” contest).

I recently got to thinking about whether you can do this kind of thing in Python. The original format of the contest doesn’t really work in Python: it’s all too easy to write Python code that doesn’t do what it seems, to the point where there’s probably no challenge. But in a dynamic language, there are lots of interesting things that aren’t realistic in a language like C.

For example, can you detect and interfere with the debugger? Can you hide your malicious behaviour when a debugger is attached to look at it? It turns out you can.

Debugging in Python

It’s pretty clear that you can’t implement a Python debugger wholly in Python without any support from the Python runtime. Python code will only run when something calls it, and your debugger code wouldn’t have any way to impose itself upon the code being debugged.

However, Python tries to make things as flexible as possible by implementing a minimal amount of support in the Python runtime and having the rest of the debugger built on top of that in Python. The key tool that makes this work is something I mentioned in an earlier article about coverage testing: sys.settrace.

The way settrace works is that you can register a hook function with it that will be called on some conditions (moving to a new line of code, entering a function scope etc.). This hook is an ordinary Python function that can do whatever you want. The implementation of settrace is built into the Python runtime, but that’s all the special support you need.

How to behave differently when being debugged

Let’s keep things simple. Let’s suppose we want to write a simple function that adds two numbers, and prints "I'm malicious" if it’s called when the debugger isn’t around to see it. Something like:

def add(a, b):
    if not in_debugger():
        print("I'm malicious")

    return a + b

It’s pretty simple, we just have to check whether the settrace hook is set:

import sys

def add(a, b):
    if sys.gettrace() is None:
        print("I'm malicious")

    return a + b

This works pretty well. If you run it in pdb, you’ll see that the message is not printed. One quirk (for better or worse) is that if you do a continue from PDB it will not detect the debugger, because continue disables the debug hook entirely (until it’s reinserted with set_trace() or breakpoint() or whatever). Depending on the requirements for our malicious program this might or might not be a problem.

This isn’t specific to PDB, either. It should work with any debugger for cpython (and may work for other Python implementations, I haven’t checked).

Of course, this is a toy example. The most obvious problem is that if you step into the function you’ll immediately see the malicious code. I have a few ideas about this, which I plan to return to later.

Python AST diagrams in WordPress

I’ve updated my Python visualiser WordPress library to support generating diagrams of Python Abstract Syntax Trees (AST). It’s still at a very early stage of development and only supports a tiny subset of AST node types, but I thought people might be interested to see what was in development.

You can produce things like this:

a = 1 + 2

from an editor embedded in Gutenberg like this:

New in Python 3.8: Assignment expressions

It’s quite rare for mature programming languages to introduce new operators. I suppose there’s no good reason for this; if a new operator is useful and doesn’t break anything existing in the language, then it’s a win. But it feels like it’s much easier to persuade people that a language needs a new standard library function than new syntax.

As Python 3.8 gradually spreads into common usage, now might be a good time to look at one of the big changes it introduced, assignment expressions or the affectionately-nicknamed “walrus operator”:

while (block := f.read(256)) != '':
    process(block)

That := is the new kid on the block.

What you can do with this

The most common reason this will be useful is where you want to use an if, while or other conditional operation and you also want to assign to a variable while you’re doing it.

If you do much text processing with regular expressions, you’ve probably come across this a lot:

m = re.match(r'(\d+)', input)
if m:
    do_stuff(m.group(1))

If you’ve come to Python from another language, you may at first be surprised that you can’t write this in a more compact way:

if m = re.match(r'(\d+)', input):
    do_stuff(m.group(1))

This looks pretty tempting, but it isn’t legal syntax. But the new operator allows you to do this:

if m := re.match(r'(\d+)', input):
    do_stuff(m.group(1))

This fixes a minor annoyance, but it doesn’t seem like a big deal. However, in the real world programmers tend to prefer writing compact code even if it’s less efficient, and will write things like:

if len(options) > max_options:
    print(f"Too many options: {len(options)}")

This calls len() twice, just to print the error message. In a case like this it isn’t going to matter, but in performance-critical code where the duplicated operation was something more expensive this could turn out to be significant.

Why is this even necessary?

The obvious question is: Why doesn’t Python let you write the expression you want to write with a simple = operator?

Other programming languages manage this just fine. C++ does this all the time:

while (iter++ != collection.end()) {
    //...
}

You can do it in Ruby:

while line = gets
  # process line
end

You can do it in Java:

while (n = input.nextInt()) {
     System.out.println("You entered " + n);
}

You can do it in Javascript:

while (x = x - 1) {
    console.log(`x is ${x}`);
}

Is Python just being deliberately difficult here?

A digression about expressions

Programming languages vary immensely, but in general you can distinguish statements and expressions.

An expression is a chunk of code that results in some value, such as 1 + 2 or name.reverse().

Expressions are powerful because you can (typically) use an expression anywhere a value is expected, including inside another expression. Therefore you can have arbitrarily complicated expressions.

A statement is a chunk of code that results in some action or state change, such as import left_pad or print("hello " + name) or num_socks = 2 * num_feet.

The body of a function (or a module or other code block) is a series of statements.

The obvious question is whether there’s any overlap between expressions and statements. An expression on its own can be treated as a trivial statement, which just evaluates the value and discards it. For example, in Python you can write:

def my_func():
    1 +1

The line 1 + 1 doesn’t do much here: the value is calculated and then discarded, and the function returns None. More usefully, some expressions will have side-effects:

def speak_my_weight(person):
    audio.speak(person.weight)

Here the audio.speak(person.weight) is an expression (it calls a function and yields a value) that’s being used as a statement, because of its side effects.

So an expression is a statement, but is a statement also an expression? That depends which language you’re using. There are three possibilities:

  • Statements are never able to be expressions (unless they are trivial)
  • Every statement is an expression
  • It depends on the statement

The first option isn’t really desirable. Languages like Lisp and Ruby go for the second option. Python takes the third.

Why would you want all statements to be expressions?

The nice thing about the rule “every statement yields a value” is that it’s really easy to explain, and really easy to remember.

Languages become more expressive by having a small number of rules that can be combined in lots of different ways; that way you get maximum power while taxing the programmer’s brain the minimum amount.

So in Ruby, for example, you can do something like:

songType = if song.mp3Type == MP3::Jazz
             if song.written < Date.new(1935, 1, 1)
               Song::TradJazz
             else
               Song::Jazz
             end
           else
             Song::Other
           end

The fact that an if block returns a value means that you don’t have to do an explicit assignment in the branches of the block. The assignment is done only once. This is a little bit forced in this case, but you can imagine if the destination of the assignment was something complex it might be nice not to have to repeat it.

Why wouldn’t you want all statements to be expressions?

Having a small number of rules that can be combined in infinite ways is very elegant, but there are always edge cases where the human brain doesn’t work like that, which can lead to confusion.

Every C programmer has done this once in their life:

if (a = 10) {
  printf("a is 10\n");
}

This fails because it’s assigning to a, not checking its value. The reason this compiles at all is that C takes the view that an assignment is an expression that yields a value, so there’s no reason you shouldn’t be able to use the resultant value in an if condition. The fact that this is plainly not what a human being would want is no concern of the C compiler.

“But wait!”, I hear you cry, “my linter would have caught that mistake. There’s no need to forbid it in the language spec when tooling can catch it.”

This is a fair position, but the truth is messier. If you have such a linter rule enabled on every project you work on and every team you work with, then it might as well be fixed in the language. If you don’t, you’ll get confused when you switch teams.

If you break the rule frequently, then you’ll have to have ugly annotations to disable the linter all over the place. If you break the rule very rarely, why is it such a big deal if the language forces you to work around it in a few rare cases?

The Python way

Python generally chooses explicit but slightly more verbose code over simpler code that can trip people up. You can judge that it makes the wrong decision if you like, but the language can’t please everyone.

Therefore Python doesn’t allow assignment expressions to yield a value that can be used in an expression.

Hang on a minute…

If you’re paying attention, you may have been starting to wonder about the Python construction:

a = b = c = 10

This technique for initialising multiple variables is popular and available in a lot of languages (though personally I’ve never found it useful). It’s often possible for a language to compile this by treating it this way:

a = (b = (c = 10))

This wouldn’t make any sense in Python, because c = 10 isn’t an expression so can’t be assigned to b. What’s going on here?

Python simply treats this as a special case. An assignment in Python can have multiple targets, and so Python chooses to treat the expression a = b = c = 10 as one single assignment, with a value of 10 and 3 targets: a, b and c.

I think this is an example of Python taking the practical path: it seems somehow neater and simpler if you don’t have to have a special rule to deal with multiple assignments, but it doesn’t actually save much complexity in the Python implementation. In return for biting the bullet and treating this as a special case, developers who write in Python benefit from a language that more frequently does what they expect.