Dictionaries preserve ordering

I came across something unexpected when preparing my post about the basics of dictionaries in Python. For as long as I’ve been using Python, it’s been a fundamental truth that dictionaries don’t have a reliable ordering. It’s mentioned all over StackOverflow. The standard library even has an OrderedDict container for cases where you want your dictionary to have order.

Turns out it isn’t so.

I was trying to create an example that shows the fundamental way that dictionaries return elements in an order that has nothing (apparently) to do with the keys or the order in which you inserted them. Something like this:

fruits = {}
fruits["apple"] = 100
fruits["banana"] = 200
fruits["cherry"] = 300

I printed the dictionary. The fruits came back in alphabetical order. Dammit.

I guess I had bad luck and the keys just happened to be in the naive order. It’s a 1 in 6 chance, right?

fruits["damson"] = 400

Still in insertion order. Maybe something is up.

This actually isn’t anything mysterious: it’s right there in the documentation:

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was implementation detail of CPython from 3.6.

I’m on Ubuntu, so I’m getting Python 3.6 as my default Python 3. Mystery solved.

I don’t know about anyone else, but I was really surprised to see this change. I was assuming that making things more user-friendly like this would only come at the cost of making things slower, and there’s no way that Python would make things slower across the board.

Why wasn’t it always this way?

To recap some of the basics: Dictionaries don’t preserve ordering because they’re implemented using hash tables. A hash table is a data structure that gives very fast lookup regardless of the number of items in the container. It works like this:

Start with a so-called “hash function” that maps strings (or whatever key value you’re using—I’ll keep it simple here) into integers. To insert a value, apply the function to the key and then store the value in an array at that numbered position. Retrieving it again works the same way. The hash function is quick, and fetching an item from a specified position in an array is trivial. The great thing about it is that it’s equally fast no matter how many other elements are stored in the hash table (I’m skipping over a few details here, to keep things simple).

To be concrete, suppose:

We can store these entries in an 8-element array like this:

We can now easily look up any of the values from its key: hash the key, look up the corresponding position in the array, check that the key matches what we expect and then return the value.

But note that there’s no way to produce the keys in alphabetical order, without looking up all the keys and then sorting them (which is a costly operation). Nor can we get the keys in insertion order, because nothing about the structure shows us the order they were inserted; the image above would look the same whatever order you inserted the three values.

So why did it change?

One of the downsides of a hash function can be seen in the diagram above: generally there’s a bunch of empty space in it. This empty space isn’t zero cost: this is an array, meaning that it’s all one block of memory and the empty spaces are blank memory just taking up space. There are two ways to limit this overhead: Pick a clever hash function that maps the keys as densely as possible into as small a space as possible, and limit the size of each element in the array. In Python 3.6, a change was made to achieve the latter.

Looking a little deeper at how Python actually uses hash tables: Firstly, the has function isn’t as simple as I suggested above. The hash function produces a 64-bit integer, which is then used to identify an entry in the array (by using 3 bits of the integer, if we’ve got an 8-entry array). This is done because if enough data is inserted into the hash table it will have to be resized, and it’s convenient if we use the same hash values in the new bigger table. For the same reason, the hash values are stored in the array and not discarded as I suggested above. This means that our hash table ends up looking like this:

There are 3 items in each array entry, so our wasted space is actually 24 bytes for each empty array entry. Can we improve on this?

Yes we can. Instead of the array holding the hash value, key and value make the array store an index into another array:

It might look like we’ve just moved the data around, but note that this new array is dense: there is no wasted space in it between elements. Each newly inserted element gets added on to the end of this array. Now we only have 8 bytes of space used per empty entry in the hash table, rather than 24. On the other hand, we’ve added 8 bytes of space per filled entry in the hash table. But this is probably a win overall for size.

We still have all the desirable hash table properties, and the only cost is that we have to follow another pointer lookup to find the value we want. But on a modern CPU following pointers is cheap provided they are stored in cache, and data is much more likely to be stored in cache if it is small. So making the data structure smaller at the cost of another pointer lookup is likely to make things faster as well as smaller.

Wait, what does this have to do with preserving order?

Usually when we make code faster we end up making it less useful, or at least less flexible. But in this case something nice falls out for free. Note that the new array the data is stored in will automatically be in insertion order. If we want to iterate over the elements of the dict, we don’t need to look at the first table, we can just walk over the insertion-order table and produce the keys in insertion order.

For more information see this thread from python-dev way back in 2012 when the change was first proposed.

Does this mean that collections.OrderedDict is now obsolete?

Good question. Actually, no. The thing with an OrderedDict is that ordering is assumed to have meaning over and above the values stored. The most obvious way this comes into play is in comparison between dictionaries: if you compare two OrderedDict instances for equality, they only compare equal if the keys are the same and the insertion order is the same. For regular dict we don’t want this to be the case: two dicts with different insertion order but the same keys and values are the same dict. Even if we did want to change this we couldn’t, because it would break existing code.

It might be that dict and OrderedDict can share all their implementation internals and just have a flag set for “ordering is significant for equality test” or whatever. I’ll admit I haven’t looked into this.

Can I rely on this behaviour?

There’s a big difference between behaviour that happens on a certain version of Python and behaviour you can rely on in your code. As the docs quoted above say, as of Python 3.7 this is an official specification and not just an implementation detail. For more information about why this is specified, see this post from Guido.

Python basics: dict lookup

I had a bunch of posts I wanted to write that would refer to dict lookup in various ways, and rather than repeat myself in those I thought it would be useful to write a single post that establishes some of the basics. You may already know this stuff, in which case please ignore it.

Dicts are everywhere in Python, and lots of other operations are built out of them. Using dicts is what makes Python so flexible. Dicts aren’t just used by you when you’re writing your application, they are also used internally to implement a bunch of key Python features. It’s probably not obvious what I’m talking about; bear with me here.

What is a dict

You’re almost certainly familiar with using a dict explicitly in Python:

phone_number_lookup = {
    'alice': '555-4321',
    'bob': '555-1234'
print("Alice's phone number is %s" % phone_number_lookup['alice'])

There are a few properties of dictionaries that account for their wide use in Python:

  • You can look up an element in a dictionary quickly. I’m deliberately going to be vague about what “quickly” means here. The important thing is that it’s fast across a wide range of circumstances: it doesn’t get significantly slower when the dictionary has a lot of stuff in it, or when the keys or values are big values.
  • You can store anything as the values in a dictionary. Strings, numbers, classes, functions, absolutely anything that Python can work with.
  • You can use lots of different types (but not everything) as the keys in a dictionary. Most importantly for our purposes, dictionaries work very well with strings as keys.
  • Dictionaries don’t have any fixed ordering of keys.

About that ordering

It might seem surprising that one of the advantages I listed was a lack of ordering, which sounds like a disadvantage.

First, I’ll expand a little on what I mean here:

>>> mydict = {}
>>> mydict['apple'] = 100
>>> mydict['banana'] = 200
>>> mydict['cherry'] = 300
>>> for key, value in mydict.items():
...     print(key, value)
('cherry', 300)
('apple', 100)
('banana', 200)

The order it prints in isn’t the order they were inserted. It’s not alphabetical ordering. In fact, it’s not any particular ordering you might want. It’s just what’s most convenient for Python.

In fact, this ordering will change depending on the version of Python you use (the above was done on cpython 2.7, for reasons I’ll go into elsewhere). It could even vary depending on what day you run the program, or what computer you run it on. The point is, you shouldn’t be making any assumptions.

This might not sound like much of an advantage, but in fact by refusing to specify details like this there’s more flexibility to change the implementation. If there’s a bunch of code out there that relies on a particular dict ordering (say it requires that the keys are always returned in alphabetical order) then it might be impossible to improve the internal implementation without breaking a lot of code.

How dicts are used

You may have spotted that when you create an instance of a class, you can assign arbitrary members to it if you want:

class MyClass:
    def my_method(self, x):
        return x + 1
foo = MyClass()
foo.not_originally_there = 100

The member not_originally_there doesn’t belong to the class, but it needs to be stored somewhere on the object. This is achieved by each object having it’s own dict to store these ad hoc members:

>>> print(foo.__dict__)
{'not_originally_there': 100}

Hang on a minute. Objects have a dict so that we can look up any members that were added after the object was created, and don’t belong to the class (that’s our not_originally_there above). But what about the members of the class? They have to be stored somewhere. That’s right, they’re in a dict:

>>> print(MyClass.__dict__)
{'__module__': '__main__',
 'my_method': <function MyClass.my_method at 0x7f0c54cfe598>,
 '__dict__': <attribute '__dict__' of 'MyClass' objects>,
 '__weakref__': <attribute '__weakref__' of 'MyClass' objects>,
 '__doc__': None}

Note that we can see all the members of MyClass, including the __dict__ member itself and a bunch of internal Python stuff. In particular, we can see that my_method is a function with an entry in the dictionary.

But there’s more than just that. If you create a module, then it has a bunch of members each of which has a name. These are stored in a dictionary:

# my_module.py
def foo(x):
    return x + 1
def bar(x):
    return x - 1
# At the command-line...
>>> import my_module
>>> print(my_module.__dict__)
[.... vast amounts of rubbish...]
'foo': <function foo at 0x7f8b32474ae8>, 'bar': <function bar at 0x7f8b32474b70>}

What about that import my_module line above? When that’s executed, we’re creating a new local name my_module that refers to the real module. A string name that refers to an object. What does that remind you of?

>>> print(globals())
{'__name__': '__main__',
 ... lots of other stuff ...
 'my_module': <module 'my_module' from '...'>

In other words, the global scope we import the module into is a dictionary. When we try to use a function or variable from global scope, it’s looked up in this dictionary to find the corresponding value. Of course, virtually all languages will have some way of mapping names to objects at some sort of “global” (maybe file or module) scope. Python is just unusual in exposing the details to you, and in consistently using the same data structure you’re using in your own code at runtime.

Fine, but what’s that all good for?

Using dicts everywhere doesn’t give a massive advantage; it’s more a matter of making things consistent and easy to reason about. However, there are a few nice things that come of it.

It makes for an import system that is very flexible. You can import a module as an object, or import some or all of the contents of a module directly. You can remap the names you import into different names as you do so. You can conditionally import modules (maybe depending on whether a certain module is available) and it will behave sensibly:

    import simplejson as json
except ImportError:
    import json

Debugging and diagnostic tools can achieve a lot without much effort. If you want to peek into the state of an object, you can examine its dict and see all the data laid out for you in an easy way.

Another example are mock object libraries like unittest.mock. In this case, you want to replace some real piece of code with a mock implementation for the duration of your unit test. You want the existing test code to call what it thinks is real code, but have it call your instrumented test code instead. This is nice and natural in Python, because you can update the module dictionary to remap the name to point to your test code instead of the real code. The change takes effect immediately, and can be reversed at the end of the test.

A final point to note is that doing dict lookup in so many cases is one of the reasons why Python is slower than other languages. Lots of times (though not all the time) if you refer to a function or variable by name in Python you’re actually asking the runtime to do a dict lookup to find the value you’re talking about. Even if you use the same name several times in a function (perhaps in a loop), Python will end up doing the lookup each time you mention it. This is great for flexibility, but it can waste a lot of time. I’ll have a lot more to say about this later.

Length hints in Python

I was grubbing about trying to understand some library code the other day (I forget which one) and I came across a double-underscore method I’d never seen before:

    def __length_hint__(self):
        return ... # Some internals here.

Since this is a double-underscore method, it can’t be (or shouldn’t be) something the library invented. It must have significance to the Python runtime. A quick search on the manual shows that this is the protocol by which a class can implement the operator.length_hint operator (duh). But of course I’ve not seen that operator either.

length_hint is slightly unusual. Most of the functions in operator correspond to built-in functions or operators accessed via special syntax (like +, == etc.) These are always available regardless of what namespaces you have imported, but are also available as properly namespaced named functions in operator, which can be useful if you want to store them in variables and use them as function arguments. However, length_hint is relatively new and thus doesn’t have any corresponding built-in function.

Anyway, length_hint is defined to return “an estimated length for the object”. It’s actually defined in PEP-424, which makes it clear that the return value of length_hint may be wrong, and may give up and return a default value (the __length_hint__ method is allowed to raise NotImplemented). What’s the use of a function that doesn’t return the right value, or even any value at all?

A simple example

This is just a toy example. Nothing actually works the way I’m going to describe it, but hopefully it will give you the idea of what’s going on.

Let’s play the role of the Python runtime for a bit. Assume we have some empty memory:

The blocks in this diagram could be bytes or multi-byte blocks or memory pages or whatever, it doesn’t really matter.

We want to create a list of prime numbers that we’ve read in from a file. We create the list called primes and load the first three numbers from the file:

Just then, we take a break from this process and need to create another object. Perhaps it’s a log message? One of those kinds of things that happens when you’re in the middle of doing something else.

Having done that, we go back to filling our list with prime numbers from the file. But you can see the problem we run into: if there’s more than 1 more prime number in the list, we end up crashing into the other block of memory.

As you can imagine, this kind of thing happens all the time. Software would get nowhere without a way to deal with this. One of the objects will have to be moved, and that means copying memory around, which wastes time.

It would be so much better if we could tell ahead of time how big our list needed to be. If we did that, we could pre-allocate enough space for the whole list of prime numbers (we’re reading it in from a file, so it must be finite).

This is where it could be useful to know the approximate length of what we’re dealing with. For a file, we can get the file size (in bytes) quickly without having to read every byte of the file. This doesn’t tell us exactly how many numbers are in the file, because different numbers may take up different amounts of space. But if we assume 3 bytes per number (2 characters plus a newline for each one) we can divide the length by 3 and make a guess.

Given our guess, we can pre-allocate enough space for our list and put the log_message item in memory somewhere where we won’t end up having to move it.

Of course, the crude method of dividing the file length by 3 might fail, and we end up not leaving enough space. But that leaves us no worse than if we hadn’t used the length hint and just dropped the log message wherever we felt like it.

Again, none of this is how things actually work, but it’s an idea of why this might be useful.

This worked because we knew we were using a file, and we know things about files (they have a length known without reading the whole file, they include line ending characters etc.). But as good programmers, particularly good Python programmers, we don’t want to have our code making too many assumptions. We want a way to get a length hint from anything: a file, a TCP socket, a list, a generator, or even a custom class that hadn’t even been written when you first wrote your code.

How length hints are actually used

It’s easy to miss the use of length hints in the Python runtime code. Length hints aren’t used in the parts of the code written in Python, only by the C code. If you simply grep the source for __length_hint__ you’ll find it being mentioned a lot, but virtually all the references are the various implementations for built-in types (list, tuple etc.) and don’t show it in use.

Where length hints are used by cpython runtime C code, it is called via the wrapper function PyObject_LengthHint (in Objects/abstract.c). This includes the following code:

PyObject_LengthHint(PyObject *o, Py_ssize_t defaultvalue)
    PyObject *hint, *result;
    Py_ssize_t res;
    if (_PyObject_HasLen(o)) {
        res = PyObject_Length(o);
        if (res < 0 && PyErr_Occurred()) {
            if (!PyErr_ExceptionMatches(PyExc_TypeError)) {
                return -1;
        else {
            return res;
    hint = _PyObject_LookupSpecial(o, &PyId___length_hint__);
    if (hint == NULL) {
        if (PyErr_Occurred()) {
            return -1;
        return defaultvalue;
    result = PyObject_CallFunctionObjArgs(hint, NULL);
    if (result == NULL) {
        if (PyErr_ExceptionMatches(PyExc_TypeError)) {
            return defaultvalue;
        return -1;
    else if (result == Py_NotImplemented) {
        return defaultvalue;
// ...

First it checks whether the object in question is of a standard sequence type, in which case it uses the standard method of getting the length directly. This is the check of _PyObject_HasLen. A real length will always be better than a length hint, if it is available.

Then it looks for a __length_hint__ member, bailing out if it doesn’t find one:

    hint = _PyObject_LookupSpecial(o, &PyId___length_hint__);
    if (hint == NULL) {
        if (PyErr_Occurred()) {
            return -1;
        return defaultvalue;

Assuming a length hint method is found, it calls the method and returns it.

This length hint function is used for example in listextend in Objects/listobject.c, which is the implementation of the built-in list type. listextend reads through an iterator, adding the elements to a list. This is obviously used to implement the list.extend() method, but it’s also used when concatenating lists or when instantiating a new list (the empty list is constructed, then the extend code is called).

The length hint is used like this (some details omitted):

    /* Guess a result list size. */
    n = PyObject_LengthHint(b, 8);
    if (n < 0) {/* Deal with bad result */}
    m = Py_SIZE(self);
    if (m > PY_SSIZE_T_MAX - n) {
        /* Overflow. Ignore this. */
    else {
        mn = m + n;
        /* Make room. */
        if (list_resize(self, mn) < 0)
            goto error;
        /* Make the list sane again. */
        Py_SIZE(self) = m;

The key point is the call to list_resize. This resizes the list to be big enough to include its existing contents, plus the hinted size of the iterator. Assuming the hint is correct, there will be no need to resize the list as it fills up. Even if it’s slightly off, it should still require fewer resizings, saving time overall.

Should you bother with this?

The above shows that this really is used, and if you implement a __length_hint__ on your iterators it will save some unnecessary work at runtime. But bear in mind that this is quite an edge case. Saving 10 or 20 allocations won’t make a measurable difference. It will only matter if you’re hitting this code thousands, maybe even millions of times in the hot loop of your algorithm.

My guess is that this is worth doing for any iterator where it’s easy to write code (say 2 or 3 lines) to generate a decent hint. If it’s more than that, wait until you’ve proven that this is something your code is hitting enough times to justify the effort.

An exception to this might be if you’re writing library code to be used by lots of different people. In that case, you can’t know how your code might be used and the payoff of your efforts could be multiplied by the number of users who end up using it. But the more generic your code is, the harder it is likely to be to write a decent length hint method.