LibrariesHow does q work?

How does q work?

In a previous post I talked about what you can do with the debug logging library q. It has some interesting features that don’t look like standard Python, though of course it all is.

Debug anywhere

Start with the simple things. One of the nice things about q is that you can stick a debug call anywhere, without having to change the logic of your code. So if you have a complex expression like:

if something(foo) + other(bar) > limit:
    # Do something

You can just stick a q() call in there to print out the intermediate value that you’re interested in:

if q(something(foo)) + other(bar) > limit:
    # Do something

All we have to do here is have q() return the argument it was given:

def q(arg):
    # Log out the value of arg
    return arg

It’s a little more complicated in practice, because q() supports being called with any number of arguments, and only returns the first. But for the common case, we can assume we have code like the above. The value gets passed into a function, and gets returned back.

Will this always work? Is there ever a case where it matters that the value has been passed through a function before use rather than being used directly? In lower-level languages this kind of thing can matter, because returning a value from a function means moving the value from one place to another. This could involve copying the value, or even constructing a new instance. But in Python, the function just takes a reference to the object passed in and discards the reference at the end (decrementing the reference count to the object and thus returning the object to its original state).

Smooth operator

Next, we can look at q’s nifty operator magic that lets us tag values to be logged without having to wrap them in brackets.

# This logs out the value of thing.flags
if q/thing.flags & (1 << which_flag):
    apply_flag(thing)

This just makes use of Python operator overloading. This is a feature that doesn’t get used all that much: the idea is that you can define the meaning of the built-in operators like +, * and & when applied to instances of the class you are defining.

In theory operator overloading is really neat, because you can define some new class and have it interact with the language’s operators as if it were a built-in type like int or float. This is used extensively in C++ and Haskell. Why not in Python?

I think the reason is that there are really two sorts of cases where you want this:

  • You’re implementing something which really is a variation on a numeric type: perhaps a rational number class, a decimal class or a tensor class
  • You’re implementing something else that doesn’t really have numeric operations, but you want to use operators rather than explicit method calls to make things a bit more concise.

The first of these is pretty rare, because there are sensible numerics in Python and a couple of fairly standard third-party libraries like Numpy. You’re not likely to be writing your own class that acts numeric.

The second is more likely to occur, but the culture of Python (explicit is better than implicit) tends to lean against it. The only two cases I can think of off the top of my head are Django Q() objects (nothing to do with the debugging library discussed here), and Scapy concatenations. Overloading might be the right choice here, but you should really weigh the cost of confusing the user with non-standard behaviour against the benefit you get in terms of conciseness.

So anyway, Q goes ahead and uses operator overloading. What does that look like?

class Q(object):
 
    # ...
 
    def __truediv__(self, arg):  # a tight-binding operator
        """Prints out and returns the argument."""
        info = self.inspect.getframeinfo(self.sys._getframe(1))
        self.show(info.function, [arg])
        return arg
    # Compat for Python 2 without from future import __division__ turned on
    __div__ = __truediv__

All this is doing is declaring a method on the class with the special name __truediv__. When Python comes across an expression involving /, like:

a / b

it actually looks for a method on the left-hand object and calls that with a and b as parameters. Looked at this way, the operator behaviour of the built-in types like int and float is actually a special case of the more general process of “dividing” any two objects.

The only limit is your imagination. Or rather, the only limit is whether this is a sensible thing to do in your case, and the answer is that it very rarely is. But I think it’s excusable in the case of q, which is going for extreme brevity. Also, q isn’t meant to be something you build into your shipping code, just something you use occasionally in debugging.

Import shenanigans

Something else might strike you after you’ve been peering at q for a while. It’s subtle, but I think it’s the most interesting (or perhaps the most underhanded) thing in the library.

You’ll notice that when you use q, you just have to do:

import q

And you have an object called q in scope. That’s it. There’s no need to qualify the object with the module name (q.q

from q import q

This is a bit reminiscent of export default in Javascript: the module only has one thing in it, and we get that rather than a whole namespace of stuff. But lots of Python modules have only one useful class in them, and they don't behave like this. How does q manage it?

First, let's ask what a module really is after we've imported it in Python. From our point of view a module is a text file with code in it, but by the time Python can use it it's been parsed and compiled into Python bytecode. These objects end up in sys.modules, which is more or less a dict mapping names to module objects:

>>> import sys
>>> 'statistics' in sys.modules
False
>>> import statistics
>>> sys.modules['statistics']
<module 'statistics' from '/usr/lib/python3.6/statistics.py'>

The module objects in sys.modules have all the entries in the module as properties:

>>> sys.modules['statistics'].harmonic_mean
<function harmonic_mean at 0x7f71f426d6a8>

In other words, the entry in sys.modules is the thing that gets brought into scope when you do import stastistics or whatever. Your code gets a new entry in its namespace that is a reference to the entry in the sys.modules dict.

But the thing we get when we do import q doesn't act quite like a module. As we explored above, it acts like an instance of the Q class, not an instance of module. If it wasn't a Q, the operator overloading and callable nature described above wouldn't work.

This works because q has one last trick up its sleeve: just before the module is done, it does the following:

# Install the Q() object in sys.modules so that "import q" gives a callable q.
sys.modules['q'] = Q()

It's overwriting the entry in sys.modules with something else, in this case an instance of Q. This is a bit sneaky: I don't know if anything in Python is going to assume that everything in sys.modules is an instance of module, but if it did you could hardly blame it.

As an aside, this makes the class Q completely inaccessible to the programmer using the library. It never gets into the namespace, and because it's not in the object that's in sys.modules you can't get at it even if you want to:

>>> import q
>>> q.Q
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Q' object has no attribute 'Q'
>>> from q import Q
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'Q'

Categories: Libraries

Comments

No Comments Yet. Be the first?

Post a comment

Your email address will not be published. Required fields are marked *