Concatenation in Elixir

Sep 22, 2016

Data in Elixir is immutable. Superficially, the implications of this are easy to grasp. When you first start off, you'll occasionally forget and write things like:

def call(conn, _default) do
  set(conn, get_req_header(conn, "authorization"))

def set(conn, nil), do: conn

def set(conn, [authorization]) do
  user = load_user_from_authorization(authorization)
  assign(conn, :user, user) # THIS IS WRONG

The above doesn't work as intended because the line with the THIS IS WRONG comment discards the new conn created by the assign function. assign creates a new conn because data is immutable, and thus conn cannot be changed directly. The fix is to re-assign the result of assign to conn:

conn = assign(conn, :user, user)

Whether you're adding a key to a map, appending a value to list, or doing just about anything else, you'll follow the same pattern (or something similar). (If this is your first time seeing Elixir, there's a pipe operator (|>) which makes it all less tedious).

I want you to take a second and consider the implications of immutable data structures for an operation as simple as string or array concatenation. You can't leverage a buffer-based structure (vector, dynamic array, stringbuilder, arraylist, ...) to amortize the cost of an append because appending would mutate the structure. In Elixir, every append has to take the original value, copy it and add the new element.

Rather than relying on "traditional" data structures, immutable languages use Persistent Data Structures which always preserve previous versions of themselves (ideally not as full copies). The simplest example of this is a Linked List, which is actually how Lists are implement in Elixir (that has its own set of implications). How does a linked list help?, I hear you asking.

Imagine we have an initial list of integers which the variable x points to:

x → [1] → [2] → [3]

Obviously (I hope), we cannot append to this list without also mutating x. In Elixir, if we append to an list, a copy of the list is created. So we're still at square one. Or are we? What we can do without needing to copy is to prepend to our list:

y = [0 | x]

[0] → [1] → [2] → [3]
 ↑     ↑
 y     x

Prepending is handy, but what if you want to append? One common solution is to keep prepending and reverse (Enum.reverse) the final result. We still incur an O(N) traversal, but we avoid all the extra allocations + copying of appending.

There's another, more efficient, option: an io list. An io list is just a list of nested lists. If I wanted to concatenate 5 to [1, 2, 3, 4], I could either append (thus having to allocate a new list with 5 spots, and copying the values over), or I could create a new list to wrap the two values: [[1, 2, 3, 4], 5]. We don't have to create a new list and copy over N values. Instead, we create a new line which points to our existing list and add the new value. We can also nest deeply: [[[1, 2, 3, 4], 5], 6].

Of course, we probably want to do something with this data. At the worst case, we can use List.flatten to get a "normal" list back. From a performance standpoint, this is a lot like prepending + reversing. However, it's also possible to flatten it as we're processing it, which is how a lot of functions behave. For example, the following code:

File.write("data", [[[1, 2, 3, 4], 5], 6])

Produces the correct 6 byte file.

It still might not be obvious how you put an io list together. Usually, you'll do it through recursion. Here's an example using Enum.reduce. It take a list of integers, and calculates their factors, storing each result in an increasingly nested list

# taken from https://rosettacode.org/wiki/Factors_of_an_integer#Elixir
defmodule RC do
  def factor(1), do: [1]
  def factor(n) do
    (for i <- 1..div(n,2), rem(n,i)==0, do: i) ++ [n]

  # Recursive (faster version);
  def divisor(n), do: divisor(n, 1, []) |> Enum.sort

  defp divisor(n, i, factors) when n < i*i    , do: factors
  defp divisor(n, i, factors) when n == i*i   , do: [i | factors]
  defp divisor(n, i, factors) when rem(n,i)==0, do: divisor(n, i+1, [i, div(n,i) | factors])
  defp divisor(n, i, factors)                 , do: divisor(n, i+1, factors)

input = [10, 11, 12, 13, 14]
list = Enum.reduce(input, [], fn (i, acc) ->
  [acc | RC.factor(i)]

The above generates an io list that looks like: [[[[[[], 1, 2, 5, 10], 1, 11], 1, 2, 3, 4, 6, 12], 1, 13], 1, 2, 7, 14].