home

Elixir, A Little Beyond The Basics - Part 1: lists

Oct 01, 2021

Arrays are a fundamental data structure that you'll find in most languages. Arrays, whether static in size, or able to grow dynamically, have a number of desirable performance characteristics. But when immutability is a core feature of the language and runtime, as it is with Elixir and Erlang, arrays have a fatal flaw: they cannot be updated without requiring the entire array to be copied.

Consider the following javascript example:

a = [1, 2, 3]

b = a
b[2] = 9

console.log(a)

When we print a, what do we see? [1, 2, 9]. Writing to the array is fast - we can update any index in the array in constant time (i.e. regardless of the size of the array). But in order of this to be true, the array must be mutated, which is why when we update b we see the change reflected in a. If we did not want a to change, we'd have to clone it when assigning it to b.

In order to reconcile the need for programs to change data and the desire for immutability, languages like Elixir rely on persistent data structure (the word persisted has nothing to do with persisting the data to disk). These are data structures which can be changed while preserving their previous value. This is generally accomplished by sharing part of the underlying data / structure.

This is best understood by looking at a singly linked list: the data structure behind Elixir's list. Each value in a linked list, often called a Node, points to the next value in the list (up until the end, which is often represented by some null value). Whereas an array is a contiguous block of memory, linked list nodes can be spread out in memory and are only chained together thanks to the next field.

Here's a basic example written in Go:

type Node struct {
  value int
  next  *Node
}

func NewList(value int) *Node {
  return &Node{value: value, next: nil}
}

func (n *Node) Append(value int) {
  // find the last node (the node where n.next == nil)
  for ; n.next != nil; n = n.next {
  }
  n.next = &Node{value: value, next: nil}
}

You can see that in order to append a value, we must traverse the list until we find the last element (the last element being the one where next is nil). Let's add a Print function and see how this list behaves:

func (n *Node) Print() {
  for ; n != nil; n = n.next {
    fmt.Print(n.value)
    fmt.Print(" ")
  }
  fmt.Println("")
}

func main() {
  a := NewList(1)
  a.Append(2)
  a.Append(3)

  b := a
  b.Append(4)

  a.Print()
  b.Print()
}

Can you guess what this will print?

The answer is two identical lines, each with the value of 1 2 3 4. Both a and b are operating on the same nodes, so when we modify one of the nodes in any way, it's reflected in both variables:

[1] --> [2] --> [3] --> nil
↑ ↑
a b

So far, we haven't isolated changes to b. Let's add another function: Prepend and make use of it:

func (n *Node) Prepend(value int) *Node {
  new := &Node{value: value, next: n}
  return new
}

func main() {
  a := NewList(1)
  a.Append(2)
  a.Append(3)

  b := a.Prepend(0)

  a.Print()
  b.Print()
}

Importantly, Prepend returns a new node. This is necessary in order for the newly created node to be accessible: there's no prev field as you'd find in a doubly linked-list. Now our data looks like:

[0] --> [1] --> [2] --> [3] --> nil
↑       ↑
b       a

When we call a.Print() we get 1 2 3, but when we call b.Print() we get 0 1 2 3. Notice that this prepend operation is efficient and doesn't depend on the length of the list. We have not mutated a; absolutely nothing about a has changed. What we've done is used a to create b. This type of sharing is fundamental to how persisted data structures work.

There are a lot of consequences to using singly linked lists. As we've just seen, the major benefit is that we get one efficient operation for adding values to the list while keeping it immutable, namely prepend. On the flip side, we cannot efficiently append or randomly access an element. Nor can we traverse the list in reverse (to do this, we need to create a new list). When you first hear this about lists in Elixir it probably seemed like a pretty big deal. And, in truth, in some cases, it is. But what I've come to realize is that overwhelmingly, being able to iterate the list and prepend to it is almost always all I need.

Having said all of this, it is possible to append to a list in Elixir using the ++ operator:

a = [1, 2, 3]
b = a ++ [4, 5]

What this does is copy the left value and prepends it to the right. Essentially, we clone a, and end up with two distinct lists:

[1] --> [2] --> [3] --> nil
↑
a

[1] --> [2] --> [3] --> [4] --> [5] --> nil
↑
b

This is an expensive operation in general, but more so in a loop where a single append keeps copying an every growing list. This is why, for all but the most trivial cases, you'll want to prepend to the list and then, only once you have the final result, reverse it to restore the intended order (or better, just leave is as-is if ordering doesn't matter).