Elixir, A Little Beyond The Basics - Part 3: maps

Oct 06, 2021

In part 1 we saw how lists are implemented and how this dictates what we can do with them (e.g., because it's a linked list, we cannot access a random element in constant time). Maps, on the other hand, expose the all of the functionality you're likely to need, so it isn't quite as important to be mindful of how they're implemented.

But just because Elixir maps do everything you expect of them (e.g., get, insert, update, delete, iterate), and do so efficiently, doesn't mean we shouldn't expand our understanding.

Elixir maps with fewer than 32 keys are implemented using a sorted list. When this grows to 32 keys or beyond, it switches to a specialized tree structure called a persisted hash array mapped trie. Like linked lists, trees can often be immutable and efficient thanks to structural sharing. In the simplest case, you can imagine that inserting a new largest value in a tree (which goes to the right-most node) can result in a new tree which shares the same left-branch as the original.

First, we need to talk about terms. Terms is a word that you'll run into when talking about Elixir or Erlang. It just means any value. A map is a term, an atom is a term, a list is a term, a integer is a term, a structure is a term, a tuple is a term, everything is a term. What's special about Elixir's terms is that they have well defined ordering. There's is no value that cannot be compared to another value:

> 1 > %{over: 9000}
false

> true < {:id, "leto"}
true

> [1, 2, 3] == 33.2
false

Specifically, the ordering rules are:

number < atom < reference < function < port < pid < tuple < map < list < bitstring

This means, for example, that a number will always be less than an atom:

> 999999999999 < :zero
true

When we compare two lists, or two maps, or two tuples, then the ordering rules are applied on the contained values:

> [999999999999] < [:zero]
true

What does this have to do with maps? Well, unlike most languages, in Elixir, any term can be a key in a map. The following is a perfectly valid map with two, admittedly unusual, keys:

%{
  %{over: 9000} => 1,
  {:ok, [3.2, ["b"], nil]} => 2
}

One place where this is actually useful is when looking up something (say, in a cache) by type and id. In other languages, you'd probably have to create a string key from the type and id, e.g. "#{type}:#{id}". But in Elixir, you can just use a tuple: {type, id}. There's no reason you can't even mix and match, have keys which are just the id and keys which are {:summary, id}.

Officially, Elixir maps are unordered. However, knowing that they're implemented using as a sorted list or a tree, means that the order is actually predictable. We know from the term ordering rules that an atom is always less than a tuple, therefore, give the following code:

m = %{"over" => 9000, {:gt, 9000} => true}
m |> Map.keys() |> IO.inspect()

We know that it'll always print {[{:gt, 9000}, "over"] because a tuple is always less than a string.

A lot of people will tell you that it's dangerous to rely on such internal details. Since, officially, maps are unordered, any assumption we make about ordering today might not be true tomorrow. And, because the ordering is based on term value (as opposed to, say, insertion order) it probably isn't that useful anyways. But just to highlight how dangerous it is to rely on such behavior, let's look at a map with integer keys:

m = %{2 => nil, 3 => nil, 5 => nil, 9 => nil}
m |> Map.keys() |> IO.inspect()

Can you guess the output? Since all the keys are integers, the output is the sorted integers: [2, 3, 5, 9]. Let's try another, can you guess the output?:

m = %{
  1 => nil, 2 => nil, 3 => nil, 4 => nil,
  5 => nil, 6 => nil, 7 => nil, 8 => nil,
  9 => nil, 10 => nil, 11 => nil, 12 => nil,
  13 => nil, 14 => nil, 15 => nil, 16 => nil,
  17 => nil, 18 => nil, 19 => nil, 20 => nil,
  21 => nil, 22 => nil, 23 => nil, 24 => nil,
  25 => nil, 26 => nil, 27 => nil, 28 => nil,
  29 => nil, 30 => nil, 31 => nil, 32 => nil,
  33 => nil
}
m |> Map.keys() |> IO.inspect()

I wouldn't blame you for thinking that the output would be 1 to 33. But it's actually [33, 12, 23, 29, 30, 26, 31, 11, 9, 32, 25, 28, 6, 13, 20, 15, 14, 2, 7, 1, 8, 3, 17, 22, 21, 4, 24, 10, 27, 19, 5, 18, 16], why?

Remember that maps with less than 32 keys are implemented differently than maps with 32 keys or more, and that's what we're seeing above. The ordering is still predictable: the same map with the same keys, no matter the order of the insertion, will result in the same output (at least within the same version of Elixir / Erlang). If having predictable ordering is something you need, consider the gb_trees module which I'll cover in a later part. Still, the fact that all terms have strict ordering implies that map ordering is predictable (else comparing the same two maps multiple times would yield different results, making it impossible to use maps as keys within maps).