Elixir, A Little Beyond The Basics - Part 3: maps

06 Oct 21

In part 1 we saw how lists are implemented and how this dictates what we can do with them (e.g., because it's a linked list, we cannot access a random element in constant time). Maps, on the other hand, expose the all of the functionality you're likely to need, so it isn't quite as important to be mindful of how they're implemented.

But just because Elixir maps do everything you expect of them (e.g., get, insert, update, delete, iterate), and do so efficiently, doesn't mean we shouldn't deepen our understanding of them.

Elixir maps with fewer than 32 keys are implemented using a sorted array. When this grows beyond 32 keys, it switches to a specialized tree structure called a persisted hash array mapped trie. Like linked lists, trees can often be immutable while efficient thanks to structural sharing. In the simplest case, you can imagine that inserting a new largest value in a tree (which goes to the right-most node) can result in a new tree which shares the same left-branch as the original.

First, we need to talk about terms. Terms is a word that you'll run into when talking about Elixir or Erlang. It just means any value. A map is a term, an atom is a term, a list is a term, a integer is a term, a structure is a term, a tuple is a term, everything is a term. What's special about Elixir's terms is that they have well defined ordering. There's is no value that cannot be compared to another value:

> 1 > %{over: 9000}

> true < {:id, "leto"}

> [1, 2, 3] == 33.2

Specifically, the ordering rules are:

number < atom < reference < function < port < pid < tuple < map < list < bitstring

This means, for example, that a number will always be less than an atom:

> 999999999999 < :zero

When we compare two lists, or two maps, or two tuples, then the ordering rules are applied to the contained values:

> [999999999999] < [:zero]

What does this have to do with maps? Well, unlike most languages, in Elixir, any term can be a key in a map:

> %{ %{over: 9000} => 1, {:ok, [3.2, ["b"], nil]} => 2 }
%{{:ok, [3.2, ["b"], nil]} => 2, %{over: 9000} => 1}

One place where this is actually useful is when looking up something (say, in a cache) by type and id. In other languages, you'd probably have to create a string key from the type and id, e.g. "#{type}:#{id}". But in Elixir, you can just use a tuple: {type, id}. There's no reason you can't even mix and match, have keys which are just the id and keys which are {:summary, id}.

Officially, Elixir maps are unordered. However, knowing that they're implemented using as a sorted array or a hash tree, means that the order is actually predictable. We know from the term ordering rules that an atom is always less than a tuple, therefore, give the following code:

m = %{"over" => 9000, {:gt, 9000} => true}
m |> Map.keys() |> IO.inspect()

We know that it'll always print {[{:gt, 9000}, "over"] because a tuple is always less than a string.

A lot of people will tell you that it's dangerous to rely on such internal details. Since, officially, maps are unordered, any assumption we make about ordering today might not be true tomorrow. And, because the ordering is based on term value (as opposed to, say, insertion order) it probably isn't that useful anyways. But just to highlight how dangerous it is to rely on such behavior, let's look at a map with integer keys:

m = %{2 => nil, 3 => nil, 5 => nil, 9 => nil}
m |> Map.keys() |> IO.inspect()

Can you guess the output? Since all the keys are integers, the the output is the sorted integers: [2, 3, 5, 9]. Let's try another, can you guess the output?:

m = %{
1 => nil, 2 => nil, 3 => nil, 4 => nil,
5 => nil, 6 => nil, 7 => nil, 8 => nil,
9 => nil, 10 => nil, 11 => nil, 12 => nil,
13 => nil, 14 => nil, 15 => nil, 16 => nil,
17 => nil, 18 => nil, 19 => nil, 20 => nil,
21 => nil, 22 => nil, 23 => nil, 24 => nil,
25 => nil, 26 => nil, 27 => nil, 28 => nil,
29 => nil, 30 => nil, 31 => nil, 32 => nil,
33 => nil
m |> Map.keys() |> IO.inspect()

If I told you that the output of the above is [33, 12, 23, 29, 30, 26, 31, 11, 9, 32, 25, 28, 6, 13, 20, 15, 14, 2, 7, 1, 8, 3, 17, 22, 21, 4, 24, 10, 27, 19, 5, 18, 16], could you guess what's going on?

Remember that maps with less than 32 keys are implemented differently than maps with 32 keys or more, and that's what we're seeing above. The ordering is still predictable: the same map with the same keys, no matter the order of the insertion, will result in the same output (at least within the same version of Elixir / Erlang). If having predictable ordering is something you need, consider the gb_trees module which I'll cover in a later part. Still, the fact that all terms have strict ordering implies that map ordering is predictable (else comparing the same two maps multiple times would yield different results, making it impossible to use maps as keys within maps).