LibrariesNew in Python 3.7: Context Variables

New in Python 3.7: Context Variables

This is the first in a series of articles that will look at new features introduced in Python 3.7. I don’t know about anyone else, but I tend only to discover new language features by chance when I come across them on StackOverflow or whatever. I figure a more deliberate process of reading the docs might do me some good, and might help other people as well.

The first feature that caught my eye was Context Variables.

Motivation

Sometimes a library has some kind of hidden state. This usually makes things more convenient for the user, e.g. setting precision in Decimals:

from decimal import *
 
getcontext().prec = 6
print(Decimal(1) / Decimal(7))
# Prints '0.142857'
 
getcontext().prec = 28
print(Decimal(1) / Decimal(7))
# Prints '0.1428571428571428571428571429'

Once you’ve written to prec, the precision is remembered until next time you change it. If you didn’t have this, you’d need some way to specify the precision in the call:

print(Decimal(1) / Decimal(7))

Even if you could figure out a nice API for that, your code would have to pass the context around everywhere it was needed. It’s nicer if the library remembers it for you.

The problem with this is what happens if multiple threads are using the library. If you’re not careful, one thread alters the state and then the other thread will print a Decimal, and end up with the wrong precision. Worse, it would depend on exactly the order in which the two threads executed, and the behaviour would be random.

Of course, no decent library has this problem with threads. The simple way around it is to have thread-local state: if I call decimal.getcontext() it will return me a value that is only used by the active thread, and if I change it it will only affect my thread.

However, things get more complicated once we are working with asynchronous code. Consider a couple of asynchronous functions:

import asyncio
 
async def db_fetch(stuff):
    # Simulate a slow query...
    await asyncio.sleep(1)
    # Maybe do something with Decimal context here?
    return 42
 
async def cache_fetch(stuff):
    # Also slow...
    await asyncio.sleep(1)
    # Maybe do something with Decimal context here?
    return 43
 
async def combine():
    first = db_fetch('select * from foo')
    second = cache_fetch('cachekey')
    return await asyncio.gather(first, second)
 
# This works in Jupyter. YMMV if you're running it elsewhere without an event
# loop running...
print(await combine())

There are no threads in this code. But because it’s written asynchronously, the parts of the code in db_fetch and cache_fetch might get executed in different orders. If the two coroutine functions were doing real work (not just pretending to work), then execution might switch back and forward between the two functions several times as they are working, and the exact pattern would depend on exactly how quickly the DB and the cache returned.

So we can no longer rely on thread-local storage, because even though there is only one thread we are still switching between two areas of the code, and they may change the state in ways that affect each other.

The solution

When coroutines are run concurrently by Python, they are internally wrapped into instances of asyncio.Task. A Task is the basic unit at which execution is scheduled: when control passes from one coroutine to another (because one is blocked and the other gets a chance to run) this is actually handled by calling the _step function on the appropriate task.

The Task class is modified to capture a context on creation, and activate that context each time control returns to that Task:

class Task:
    def __init__(self, coro):
        ...
        # Get the current context snapshot.
        self._context = contextvars.copy_context()
        self._loop.call_soon(self._step, context=self._context)
 
    def _step(self, exc=None):
        ...
        # Every advance of the wrapped coroutine is done in
        # the task's context.
        self._loop.call_soon(self._step, context=self._context)

call_soon is an asyncio function that causes a function to be asynchronously called later.

But what’s actually in the context?

You can think of it as a collection of variable states, essentially like a namespace dict, except that the lookup isn’t done by name (which would raise the possibility of name clashes).

A library that wants to have asynchronous context declares a context variable:

my_state = ContextVar('my_state')
my_state.set('apple')

The my_state variable is now a handle that we can use to look up a value in the context, and get and set the value. The value can be any Python value, so you can put a dict or an object or whatever.

Code that may run in an asynchronous context will read the value of the context variable any time it needs it like this:

my_state.get()

Behind the scenes, this is getting the value of the my_state variable in the currently active context (which was changed into just before asyncio passed control to the task’s _step method). Therefore the library code can safely read and write this value without interfering with other asynchronous tasks that might be using the same library.

Any time a new Task is created, the context is copied so that the task has its own copy of the context.

The mapping in the Context is an immutable dictionary. This means that copying the context once per Task is still cheap. Most of the time the code won’t actually change the context (or at least, won’t change all the variables in the context) so the unchanged variables can continue to be shared between contexts. Only as and when they are written is a cost incurred, and in this case it’s a necessary cost.

If your code makes use of several libraries that use context variables, they will all be storing their values in the same context. This is OK, since the libraries will have different handle objects (the object returned from ContextVar()) so they can’t accidentally overwrite each other’s state.

Conclusion

Context variables are worth knowing about. I guess that if you’re tempted to use thread-local state the answer should always be to use a context variable instead, unless you’re writing internal code where you know that it won’t be used asynchronously or published to be used by other people who may use it asynchronously. In practice that probably means that all code using thread-local state should use context variables instead.

The internals are a bit hairy to think about, but the public interface looks really nice and simple.

Categories: Libraries

Comments

No Comments Yet. Be the first?

Post a comment

Your email address will not be published. Required fields are marked *