Python is a pretty regular language; it follows patterns. Most of the time if you can do something in a function, you can do it in global scope or in a class definition or vice versa. For example, you can execute arbitrary code while defining a class:
class Foo: print("defining foo") def method(self): return 1
Likewise, you can define classes or import modules from within a function:
def foo(): import math print(math.sqrt(16)) class Foo: def method(self): return 1 return Foo()
This kind of thing is sometimes useful, and sometimes confusing. But for better or worse, it’s part of the philosophy of Python. When new features get added to Python, they tend to retain this kind of regularity.
Which makes it all the more odd that there’s (at least) one thing you can’t do in function scope that you can do in global scope: use
def do_sqrt(x): from math import * return sqrt(x)
gives the error:
SyntaxError: import * only allowed at module level
The reason why turns out to be interesting.
Lookup in global scope
I’ve explained elsewhere that the global scope is a dict, and that
import makes additions to that dict. When you import a module at global scope, Python creates a new entry in the global dict with the name of the module you just imported (or its alias if you did an
import ... as ...).
When you do
from foo import *, something slightly more complicated happens. Python looks in the imported module, and adds all the global members of that module as members of the current module global dict. Importantly, it adds only those members that exist at the time the import is carried out. The contents of a module change as the module is processed, by importing submodules and defining classes and functions. The
import * process freezes the view of the module at the moment the import takes place. This is a constant source of confusion for newbies who over-use
import * and then introduce circular dependencies.
To reiterate a point I’ve made elsewhere: this means that when you use a name (variable / class / module / function) in global scope, you’re doing a lookup in a dict. Python takes the variable name, treats it as a string and looks up the value in the global values dict.
This is nice and simple and regular. But it’s also slow.
Lookup in function scope
To see what goes on inside a function, we’ll need to do some bytecode disassembly. See here if you need some background on Python bytecode.
Take a simple function like this:
def add_multiply(a, b, c): return (a + b) * c
And disassemble it:
2 0 LOAD_FAST 0 (a) 2 LOAD_FAST 1 (b) 4 BINARY_ADD 6 LOAD_FAST 2 (c) 8 BINARY_MULTIPLY 10 RETURN_VALUE
If you’re familiar with the basics of bytecode this should look pretty familiar: Load
b onto the stack, then add the two top stack elements. Load
c onto the stack, then multiply the top two stack elements. Then return the top of the stack.
Look closer at the loading, though. The instruction is
LOAD_FAST, and according to the docs this instruction:
Pushes the value associated with
co_varnames[var_num]onto the stack.
var_num is the integer argument given to the instruction. In the instruction:
2 LOAD_FAST 1 (b)
the 2 is an offset from the start of the function. The 1 is the argument given to the
LOAD_FAST instruction. The
(b) doesn’t really exist in the code, it’s just something the disassembler prints in order to help out the human reader.
So in this case, the instruction “pushes the value associated with
co_varnames onto the stack”.
Let’s look at
>>> print(add_multiply.__code__.co_varnames) ('a', 'b', 'c')
So we’re loading “the value associated with
b“, which sounds like what we want. But it raises the question of what “the value associated with” means here.
I think the way to think about it is to see a function like this:
The function we’ve defined has three buckets,
c. They are as much a part of the function as the code is, and they are defined when the function is compiled (and can’t be added or removed afterwards). When we have code that says something like “add
b“, Python can transform it into “add the contents of the first bucket to the contents of the second bucket”. Since these buckets are always in the same place and they are referred to directly by number, they are much faster to look up than it would be to find the member in a dictionary.
A function can have any number of these buckets, but the number is determined when the function is compiled and never changes.
This works because the function’s code is compiled all at once, and it’s not possible to add new variables after it’s compiled. Think about it: you can’t refer to a variable that’s not mentioned in the code somewhere at the point when you compile the code.
In light of this, we can see why we can’t use
import * at function scope. If an
import * were executed, Python would have to create a bucket for each member of the imported module in order that code could use
LOAD_FAST to get these values. But at the point where the function is compiled, there’s no way to know what will be present in the imported module, so no way to pre-allocate buckets for the members. The module might not even be loaded when the function is compiled; it might not even exist (perhaps if your code uses an optional extension that isn’t available on all platforms, and it’s convenient to define the function but not call it on platforms where the extension isn’t supported).
Is it theoretically possible to work around this and make
import * work at function scope? I think so. Python could generate a
LOAD_FAST for variable names it recognises, and a slower
LOAD_NAME for all other variables. But this would mean creating a new dict on each function call (it’s a fresh scope each time the function is called, not just when a new function is defined) which is unnecessarily wasteful for a feature that isn’t very useful. Python makes the pragmatic decision not to support this, even at the loss of regularity.