Embedding Python widgets in WordPress

I’ve got a new project that I’ve been working on lately. My thinking is that a lot of programming topics are easier to explain with small interactive tools, but standard publishing tools are targeted towards text and don’t give you a way to incorporate code. If you go your own way and publish a code-driven page you have to reinvent the wheel around lots of tooling for editing, hosting, caching, comments etc.

Since the release of Gutenberg WordPress makes it quite easy to cleanly embed custom Javascript widgets. So how about using that to teach Python?

The widget below is a working Python tokeniser. You can edit the code however you like and see the output, with a pretty accurate emulation of the official cpython tokeniser:

foo = “hello” print(foo + “world”)

This is just a first step, and it’s not very general at the moment. However, this demonstrates a lot of the required steps and should be a good basis to build from.

It’s actually broken down into two separate projects:

  • python-code-analyzer, a Javascript library that doesn’t have any WordPress dependencies. It uses React but can in principle be integrated with any other Javascript code. This does the bulk of the work.
  • wp-python-analyzer, a WordPress package that wraps the tools up into blocks that can be deployed into the Gutenberg editor.

For what it’s worth the Python execution is done using Skulpt, an existing open-source Javascript library for executing Python code. Skulpt has its problems, but it seems pretty capable for educational purposes.

New in Python 3.7: The breakpoint() function

I started out trying to pick through the interesting features in Python 3.7 but ended up leaving it so long that Python 3.8 is already out. Even so, there’s a feature that I genuinely only noticed a week ago, and it’s a small but significant one.

For as long as I’ve known how to write Python my standard tool when I’m frustrated with a unit test is to insert on the offending line:

import pdb; pdb.set_trace()

I never got used to managing breakpoints in my IDE because I’m often running something remotely or in a docker container and remote debugging is usually a pain to set up.

Still, it’s always bugged me that this has to be two lines of code.

Luckily someone else has had the same thoughts as me, and more to the point has got round to doing something about it. From Python 3.7 onwards, there’s a built-in function that allows you to do this in one line:


And that’s all there is to it.

Digging a bit deeper

It turns out that the code:


does a little more than the old PDB snippet above, as the documentation helpfully explains:

This function drops you into the debugger at the call site. Specifically, it calls sys.breakpointhook(), passing args and kws straight through. By default, sys.breakpointhook() calls pdb.set_trace() expecting no arguments. In this case, it is purely a convenience function so you don’t have to explicitly import pdb or type as much code to enter the debugger. However, sys.breakpointhook() can be set to some other function and breakpoint() will automatically call that, allowing you to drop into the debugger of choice.

So sys.breakpointhook is made available as a writable value, and you can assign your own choice of function over it. This isn’t very useful as a developer, but it’s vital if you’re an IDE writer. When you run Python in an IDE, you might want to provide your own suite of debug tools without PDB getting in the way. If someone’s using your IDE to debug their code and they type breakpoint(), they probably want the IDE breakpoint feature and not PDB.

This raises an interesting question of whether this could be used maliciously. Code could write to sys.breakpointhook to make itself more difficult to debug, or it could check the value of it in order to exhibit different behaviour when run in a non-PDB debugger. This is pretty limited though, since anyone having trouble with an IDE debugger will probably fall back to trying PDB pretty quickly and defeat this simple trick.

New in Python 3.7: Context Variables

This is the first in a series of articles that will look at new features introduced in Python 3.7. I don’t know about anyone else, but I tend only to discover new language features by chance when I come across them on StackOverflow or whatever. I figure a more deliberate process of reading the docs might do me some good, and might help other people as well.

The first feature that caught my eye was Context Variables.


Sometimes a library has some kind of hidden state. This usually makes things more convenient for the user, e.g. setting precision in Decimals:

from decimal import *
getcontext().prec = 6
print(Decimal(1) / Decimal(7))
# Prints '0.142857'
getcontext().prec = 28
print(Decimal(1) / Decimal(7))
# Prints '0.1428571428571428571428571429'

Once you’ve written to prec, the precision is remembered until next time you change it. If you didn’t have this, you’d need some way to specify the precision in the call:

print(Decimal(1) / Decimal(7))

Even if you could figure out a nice API for that, your code would have to pass the context around everywhere it was needed. It’s nicer if the library remembers it for you.

The problem with this is what happens if multiple threads are using the library. If you’re not careful, one thread alters the state and then the other thread will print a Decimal, and end up with the wrong precision. Worse, it would depend on exactly the order in which the two threads executed, and the behaviour would be random.

Of course, no decent library has this problem with threads. The simple way around it is to have thread-local state: if I call decimal.getcontext() it will return me a value that is only used by the active thread, and if I change it it will only affect my thread.

However, things get more complicated once we are working with asynchronous code. Consider a couple of asynchronous functions:

import asyncio
async def db_fetch(stuff):
    # Simulate a slow query...
    await asyncio.sleep(1)
    # Maybe do something with Decimal context here?
    return 42
async def cache_fetch(stuff):
    # Also slow...
    await asyncio.sleep(1)
    # Maybe do something with Decimal context here?
    return 43
async def combine():
    first = db_fetch('select * from foo')
    second = cache_fetch('cachekey')
    return await asyncio.gather(first, second)
# This works in Jupyter. YMMV if you're running it elsewhere without an event
# loop running...
print(await combine())

There are no threads in this code. But because it’s written asynchronously, the parts of the code in db_fetch and cache_fetch might get executed in different orders. If the two coroutine functions were doing real work (not just pretending to work), then execution might switch back and forward between the two functions several times as they are working, and the exact pattern would depend on exactly how quickly the DB and the cache returned.

So we can no longer rely on thread-local storage, because even though there is only one thread we are still switching between two areas of the code, and they may change the state in ways that affect each other.

The solution

When coroutines are run concurrently by Python, they are internally wrapped into instances of asyncio.Task. A Task is the basic unit at which execution is scheduled: when control passes from one coroutine to another (because one is blocked and the other gets a chance to run) this is actually handled by calling the _step function on the appropriate task.

The Task class is modified to capture a context on creation, and activate that context each time control returns to that Task:

class Task:
    def __init__(self, coro):
        # Get the current context snapshot.
        self._context = contextvars.copy_context()
        self._loop.call_soon(self._step, context=self._context)
    def _step(self, exc=None):
        # Every advance of the wrapped coroutine is done in
        # the task's context.
        self._loop.call_soon(self._step, context=self._context)

call_soon is an asyncio function that causes a function to be asynchronously called later.

But what’s actually in the context?

You can think of it as a collection of variable states, essentially like a namespace dict, except that the lookup isn’t done by name (which would raise the possibility of name clashes).

A library that wants to have asynchronous context declares a context variable:

my_state = ContextVar('my_state')

The my_state variable is now a handle that we can use to look up a value in the context, and get and set the value. The value can be any Python value, so you can put a dict or an object or whatever.

Code that may run in an asynchronous context will read the value of the context variable any time it needs it like this:


Behind the scenes, this is getting the value of the my_state variable in the currently active context (which was changed into just before asyncio passed control to the task’s _step method). Therefore the library code can safely read and write this value without interfering with other asynchronous tasks that might be using the same library.

Any time a new Task is created, the context is copied so that the task has its own copy of the context.

The mapping in the Context is an immutable dictionary. This means that copying the context once per Task is still cheap. Most of the time the code won’t actually change the context (or at least, won’t change all the variables in the context) so the unchanged variables can continue to be shared between contexts. Only as and when they are written is a cost incurred, and in this case it’s a necessary cost.

If your code makes use of several libraries that use context variables, they will all be storing their values in the same context. This is OK, since the libraries will have different handle objects (the object returned from ContextVar()) so they can’t accidentally overwrite each other’s state.


Context variables are worth knowing about. I guess that if you’re tempted to use thread-local state the answer should always be to use a context variable instead, unless you’re writing internal code where you know that it won’t be used asynchronously or published to be used by other people who may use it asynchronously. In practice that probably means that all code using thread-local state should use context variables instead.

The internals are a bit hairy to think about, but the public interface looks really nice and simple.

What happens when a class is created?

This post will dig into what happens when a new class is created, at a bytecode level. This isn’t too interesting on its own, but one of my other posts that’s in the works has ended up being far too long, and so I’m breaking this out as a chunk that I can refer to.

As you may already know, class definitions in Python are executed like any other code. This means that you can execute arbitrary code in your definition if you so wish:

class Dog:
    print("In the process of creating class")
    def bark(self):

This prints the message "In the process of creating class" once and only once, when the module that defines the class is first imported.

You can also conditionally define a class:

import random
if random.choice([True, False]):
    class Dog:
        def bark(self):

If you do this, then 50% of the time it will give you a working class, and 50% of the time you’ll get a NameError if you try to use the class. This is stupid, but maybe you can usefully use this kind of thing for OS-dependent classes that shouldn’t be available if the underlying OS doesn’t support them.

So what’s actually going on in the bytecode? Let’s disassemble it and find out. I’m going to assume that you already know the basics of how Python bytecode works. If not, you can see my article on it here.

First of all, we compile some code:

source = """
class Dog:
    def bark(self):
code = compile(source, '<string>', 'exec')

We can then disassemble it to find out what’s going on. If you’re using Python 3.6, you’ll get something like this:

import dis

will print to STDOUT:

 2           0 LOAD_BUILD_CLASS
             2 LOAD_CONST                0 (<code object Dog at 0x7f42d20b6c00, file "<string>", line 2>)
              4 LOAD_CONST               1 ('Dog')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               1 ('Dog')
             10 CALL_FUNCTION            2
             12 STORE_NAME               0 (Dog)
             14 LOAD_CONST               2 (None)
             16 RETURN_VALUE

The basic form of this is reasonably familiar, but what’s the MAKE_FUNCTION talking about? This is code that makes a class, and yet the bytecode is making a function. And where’s the function bark? This actually is a function, but it’s nowhere in sight when MAKE_FUNCTION is being invoked.

Let’s break it down. We can see this as:

 2           0 LOAD_BUILD_CLASS

... some other stuff ...

             10 CALL_FUNCTION            2
             12 STORE_NAME               0 (Dog)
             14 LOAD_CONST               2 (None)
             16 RETURN_VALUE

What this does is load a special built-in function called __build_class__, set up some arguments (which we skip over for now), call the function with two arguments on the stack, assign the result to the name Dog and then return None. So the interesting things to consider are:

  • What goes on inside __build_class__?
  • What are the arguments that we pass to it?

What does __build_class__ do?

__build_class__ isn’t documented anywhere, but it’s easy enough to find in the cpython source code. I won’t step through it line by line, but you can find it at Python/bltinmodule.c if you want to dig in to the details.

The __build_class__ function takes at least two arguments (a function plus a name string), with optional arguments after that for base classes. Let’s ignore base classes for now.

The interesting part of the __build_class__ function is this:

    cell = PyEval_EvalCodeEx(PyFunction_GET_CODE(func), PyFunction_GET_GLOBALS(func), ns,
                             NULL, 0, NULL, 0, NULL, 0, NULL,

Here func is the function that was passed in to __build_class__, which is the mystery function we haven’t explained yet. The only other variable is ns, which is an empty Python dict.

This call evaluates some code. Specifically, it executes the code of the function func, in the context of the globals that func has access to, and using ns as the local namespace. The return value gets mostly ignored. If this function does anything at all, the interesting thing is in its side-effects on the dict ns.

Hint: side-effects on ns are very important to this.

After we’ve evaluated this mystery function, the ns dict is passed to the class’s metaclass. Metaclasses in Python get a bit confusing, so for now let’s ignore this detail and assume we’re using the default metaclass, which is type(). Therefore what we’re doing is calling:

   type("Dog", base_classes, ns)

You can think of this as a class instantiation: The class is type, and the instance we end up with is the Dog class. Dog is both a class and an instance: it’s a class for future instances like rex, rover and lassie, but is itself an instance of type.

What are the arguments we pass in to __build_class__?

We’ve figured out that __build_class__ takes a mystery function, evaluates it for side effects, then creates an instance of type using the resultant namespace. But what is the mystery function?

Let’s look again at that disassembly:

 2           0 LOAD_BUILD_CLASS
             2 LOAD_CONST                0 (<code object Dog at 0x7f42d20b6c00, file "<string>", line 2>)
              4 LOAD_CONST               1 ('Dog')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               1 ('Dog')
             10 CALL_FUNCTION            2
             12 STORE_NAME               0 (Dog)
             14 LOAD_CONST               2 (None)
             16 RETURN_VALUE

Specifically, we’ll look at the bit we skipped over before:

             2 LOAD_CONST                0 (<code object Dog at 0x7f42d20b6c00, file "<string>", line 2>)
              4 LOAD_CONST               1 ('Dog')
              6 MAKE_FUNCTION            0

When MAKE_FUNCTION is called with an argument of 0, it’s the simplest case: it takes only two arguments. The two arguments are a code object and a name. So if we want to know about the function we’re creating (and ultimately calling inside of __build_class__) we need to look inside this code object ot see what it’s doing.

The code object is loaded with LOAD_CONST 0, which means that we can find it in the tuple of constants associated with this code block:



  2           0 LOAD_NAME                0 (__name__)
              2 STORE_NAME               1 (__module__)
              4 LOAD_CONST               0 ('Dog')
              6 STORE_NAME               2 (__qualname__)

  3           8 LOAD_CONST               1 (<code object bark at 0x7fb6c7b77930, file "<string>", line 3>)
             10 LOAD_CONST               2 ('Dog.bark')
             12 MAKE_FUNCTION            0
             14 STORE_NAME               3 (bark)
             16 LOAD_CONST               3 (None)
             18 RETURN_VALUE

Now we’re getting somewhere. Suddenly this looks a bit more like the inside of a class definition. We’re loading up our method object and giving it the name "bark". The actual code for the method isn’t visible here, it’s stored in a constant nested inside the code object:



  4           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('woof')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

You should recognise this bit as the innards of this method:

    def bark(self):

So what are we actually saying here?

It’s got a bit confusing. Functions within functions within functions.

I think the key is to think of this class definition:

class Dog:
    def bark(self):

as actually being a function dressed up:

def _Dog_namespace_creator():
    def bark(self):

Creating the class works something like this:

  • Compile a function that creates the Dog namespace (I’m calling this _Dog_namespace_creator for clarity, though it isn’t really called that).
  • Execute this function, and keep hold of the resultant namespace. Remember that a namespace is just a dictionary. In our case, the namespace after executing this function contains one member, a function called bark.
  • Create an instance of type (or some other metaclass) using this namespace. The class will therefore have a method called bark in its namespace.

This is all done once, when the class is defined. None of this stuff needs to happen again when the class is instantiated.

Hmm. What’s the really short version?

The body of a class definition is a lot more like a function than you think. It actually is a function, one that is executed when the class is defined and builds the members of the class.