InternalsPython basics: dict lookup

Python basics: dict lookup

I had a bunch of posts I wanted to write that would refer to dict lookup in various ways, and rather than repeat myself in those I thought it would be useful to write a single post that establishes some of the basics. You may already know this stuff, in which case please ignore it.

Dicts are everywhere in Python, and lots of other operations are built out of them. Using dicts is what makes Python so flexible. Dicts aren’t just used by you when you’re writing your application, they are also used internally to implement a bunch of key Python features. It’s probably not obvious what I’m talking about; bear with me here.

What is a dict

You’re almost certainly familiar with using a dict explicitly in Python:

phone_number_lookup = {
    'alice': '555-4321',
    'bob': '555-1234'
}
 
print("Alice's phone number is %s" % phone_number_lookup['alice'])

There are a few properties of dictionaries that account for their wide use in Python:

  • You can look up an element in a dictionary quickly. I’m deliberately going to be vague about what “quickly” means here. The important thing is that it’s fast across a wide range of circumstances: it doesn’t get significantly slower when the dictionary has a lot of stuff in it, or when the keys or values are big values.
  • You can store anything as the values in a dictionary. Strings, numbers, classes, functions, absolutely anything that Python can work with.
  • You can use lots of different types (but not everything) as the keys in a dictionary. Most importantly for our purposes, dictionaries work very well with strings as keys.
  • Dictionaries don’t have any fixed ordering of keys.

About that ordering

It might seem surprising that one of the advantages I listed was a lack of ordering, which sounds like a disadvantage.

First, I’ll expand a little on what I mean here:

>>> mydict = {}
>>> mydict['apple'] = 100
>>> mydict['banana'] = 200
>>> mydict['cherry'] = 300
>>> for key, value in mydict.items():
...     print(key, value)
... 
('cherry', 300)
('apple', 100)
('banana', 200)

The order it prints in isn’t the order they were inserted. It’s not alphabetical ordering. In fact, it’s not any particular ordering you might want. It’s just what’s most convenient for Python.

In fact, this ordering will change depending on the version of Python you use (the above was done on cpython 2.7, for reasons I’ll go into elsewhere). It could even vary depending on what day you run the program, or what computer you run it on. The point is, you shouldn’t be making any assumptions.

This might not sound like much of an advantage, but in fact by refusing to specify details like this there’s more flexibility to change the implementation. If there’s a bunch of code out there that relies on a particular dict ordering (say it requires that the keys are always returned in alphabetical order) then it might be impossible to improve the internal implementation without breaking a lot of code.

How dicts are used

You may have spotted that when you create an instance of a class, you can assign arbitrary members to it if you want:

class MyClass:
    def my_method(self, x):
        return x + 1
 
foo = MyClass()
foo.not_originally_there = 100

The member not_originally_there doesn’t belong to the class, but it needs to be stored somewhere on the object. This is achieved by each object having it’s own dict to store these ad hoc members:

>>> print(foo.__dict__)
{'not_originally_there': 100}

Hang on a minute. Objects have a dict so that we can look up any members that were added after the object was created, and don’t belong to the class (that’s our not_originally_there above). But what about the members of the class? They have to be stored somewhere. That’s right, they’re in a dict:

>>> print(MyClass.__dict__)
{'__module__': '__main__',
 'my_method': <function MyClass.my_method at 0x7f0c54cfe598>,
 '__dict__': <attribute '__dict__' of 'MyClass' objects>,
 '__weakref__': <attribute '__weakref__' of 'MyClass' objects>,
 '__doc__': None}

Note that we can see all the members of MyClass, including the __dict__ member itself and a bunch of internal Python stuff. In particular, we can see that my_method is a function with an entry in the dictionary.

But there’s more than just that. If you create a module, then it has a bunch of members each of which has a name. These are stored in a dictionary:

# my_module.py
def foo(x):
    return x + 1
 
def bar(x):
    return x - 1
 
# At the command-line...
>>> import my_module
>>> print(my_module.__dict__)
[.... vast amounts of rubbish...]
'foo': <function foo at 0x7f8b32474ae8>, 'bar': <function bar at 0x7f8b32474b70>}

What about that import my_module line above? When that’s executed, we’re creating a new local name my_module that refers to the real module. A string name that refers to an object. What does that remind you of?

>>> print(globals())
{'__name__': '__main__',
 ... lots of other stuff ...
 'my_module': <module 'my_module' from '...'>
}

In other words, the global scope we import the module into is a dictionary. When we try to use a function or variable from global scope, it’s looked up in this dictionary to find the corresponding value. Of course, virtually all languages will have some way of mapping names to objects at some sort of “global” (maybe file or module) scope. Python is just unusual in exposing the details to you, and in consistently using the same data structure you’re using in your own code at runtime.

Fine, but what’s that all good for?

Using dicts everywhere doesn’t give a massive advantage; it’s more a matter of making things consistent and easy to reason about. However, there are a few nice things that come of it.

It makes for an import system that is very flexible. You can import a module as an object, or import some or all of the contents of a module directly. You can remap the names you import into different names as you do so. You can conditionally import modules (maybe depending on whether a certain module is available) and it will behave sensibly:

try:
    import simplejson as json
except ImportError:
    import json

Debugging and diagnostic tools can achieve a lot without much effort. If you want to peek into the state of an object, you can examine its dict and see all the data laid out for you in an easy way.

Another example are mock object libraries like unittest.mock. In this case, you want to replace some real piece of code with a mock implementation for the duration of your unit test. You want the existing test code to call what it thinks is real code, but have it call your instrumented test code instead. This is nice and natural in Python, because you can update the module dictionary to remap the name to point to your test code instead of the real code. The change takes effect immediately, and can be reversed at the end of the test.

A final point to note is that doing dict lookup in so many cases is one of the reasons why Python is slower than other languages. Lots of times (though not all the time) if you refer to a function or variable by name in Python you’re actually asking the runtime to do a dict lookup to find the value you’re talking about. Even if you use the same name several times in a function (perhaps in a loop), Python will end up doing the lookup each time you mention it. This is great for flexibility, but it can waste a lot of time. I’ll have a lot more to say about this later.

Categories: Internals

Comments

No Comments Yet. Be the first?

Post a comment

Your email address will not be published. Required fields are marked *