home

Elixir, A Little Beyond The Basics - Part 2: iolist

Oct 04, 2021

Now and again you'll find yourself wanting to dynamically build a string. For simple cases, string concatenation will do. But this quickly becomes inefficient as you end up repeatedly copying an ever-growing value (we previously spoke briefly about the performance of the ++ operator). In Go, you'd likely reach for strings.Builder (or a bytes.Buffer in older versions). In Java and .NET, a StringBuffer and StringBuilder. In Rust, a String. These all follow the same pattern: they try to reduce the number of allocations and frequency of data copying by over-allocating (e.g. having spare capacity for future operations), which amortizes the cost of concatenation.

In Elixir, this isn't an option. We don't have control over memory allocation, and if we did, we wouldn't be able to mutate it anyways. To deal with this, Elixir and Erlang have a type called an iolist. An iolist is a list that can only contain other iolists, strings or a byte. Here are some examples:

["it's over 9000!"]
["it's ", "over", " 9000!"]
["it's ", ["over ", "9000!"]]
[[], ["it's"]," ", ["over ", "9000!"]]
[[["it", "'s", [[[]]], " o", ["v", ["e", "r"]]], " 9000!"]]

Note that it doesn't matter how deeply nested it is or the number of strings or iolists in the parent or any of the child lists.

We can convert these iolists into strings using :erlang.iolist_to_binary or Elixir's wrapper IO.iodata_to_binary. They all result in the same string being created:

> :erlang.iolist_to_binary(["it's over 9000!"])
"it's over 9000!"

> :erlang.iolist_to_binary(["it's ", "over", " 9000!"])
"it's over 9000!"

> :erlang.iolist_to_binary(["it's ", ["over ", "9000!"]])
"it's over 9000!"

> :erlang.iolist_to_binary([[], ["it's"]," ", ["over ", "9000!"]])
"it's over 9000!"

> :erlang.iolist_to_binary([[["it", "'s", [[[]]], " o", ["v", ["e", "r"]]], " 9000!"]])
"it's over 9000!"

Compared to concatenation, iolists are more efficient to create. Concatenation requires copying the values into an ever-growing memory. Every concatenation is a new allocation and copy. With iolists, you're creating a list that points to existing values:

def say(tense, power) do
  text = "it"
  text = text <>
    case tense do
      :past -> " was"
      :present -> " is"
      :future -> " will be"
    end

  text = text <>
    case power > 9000 do
      true -> " over"
      false -> " under (or equal!)"
    end

  IO.puts(text <> " 9000!")
end

# vs

def say(tense, power) do
  text = ["it"]
  text =
    case tense do
      :past -> [text, " was"]
      :present -> [text, " is"]
      :future -> [text, " will be"]
    end

  text =
    case power > 9000 do
      true -> [text, " over"]
      false -> [text, " under (or equal!)"]
    end

  IO.puts([text, " 9000!"])
end

In our iolist version, there's only a single "it". Yes, you are allocating new lists, but this is relatively cheap. In the first version "it" is copied three times (i.e. "it is", "it is over" and "it is over 9000!"): a quadratic algorithm.

Iolists become particularly efficient when used with a modules/functions that "understands" them. We saw how :erlang.iolist_to_binary/1 can be used to turn an iolist into a binary. But some functions can operate on the iolist directly. For example, File.write!/2 accepts and works with iolists just fine:

File.write!("test", [["it's"], " ", "over", [[[], " 9", ["000"]], "!"]])

This is efficient because of the writev system call which itself operates directly on a list of values. Significant allocations and copying can be avoided: at no point is the final string ever assembled in your program.

Many libraries expose a _to_iodata variant which will return an iolist that you can then pass on to other iolist-aware libraries. For example Jason expose an encode_to_iodata function. In fact, the encode/1 function that you're probably using is basically using encode_to_iodate/1 and then :erlang.iolist_to_binary/1. It's quite common to be able to get an iolist as a return value from one function and pass it into another. For example, we can take the iodata returned from Jason.encode_to_iodata!/1 and pass is to Plug.Conn.send_resp/3.

One thing to be aware of is that it isn't obvious if a function uses an iolists directly or simply converts it to a string. For example, Jason.decode accepts an iolist, but immediately converts it to a string.

Iolists are deceptively easy, yet in my experience, that simplicity doesn't sink-in until you use them to solve a problem. To that end, let's try an exercise. You're given some input where each line is a valid json object. This is commonly known as JSON Lines, and it's quite common in logging systems:

{"id": 1, "message": "it's"}
{"id": 2, "message": "over"}
{"id": 3, "message": "9000"}
{"id": 4, "message": "!!!!"}

We want to take the above 4 valid JSON lines and turn it into 1 valid JSON object:

{
  "total": 4,
  "data": [
    {"id": 1, "message": "it's"},
    {"id": 2, "message": "over"},
    {"id": 3, "message": "9000"},
    {"id": 4, "message": "!!!!"}
  ]
}

Here's the simple solution, which does not involve using iolists:

# normally this would come from a file
# but let's use a string and keep our focus on the data manipulation
input = ~s({"id": 1, "message": "it's"}
{"id": 2, "message": "over"}
{"id": 3, "message": "9000"}
{"id": 4, "message": "!!!!"})

entries = input
|> String.split("\n", trim: true)
|> Enum.map(fn line -> Jason.decode!(line) end)

%{total: Enum.count(entries), data: entries}
|> Jason.encode!()
|> IO.puts()

This code decodes the JSON only to re-encode it. It's quite wasteful; we already have perfectly valid serialized json, we just kind of want to glue it together. Can you figure out a solution that doesn't involve decoding and re-encoding?

show
input = ~s({"id": 1, "message": "it's"}
{"id": 2, "message": "over"}
{"id": 3, "message": "9000"}
{"id": 4, "message": "!!!!"})

[first | lines] = String.split(input, "\n", trim: true)

{count, entries} =
  Enum.reduce(lines, {0, [first]}, fn line, {count, acc} ->
    {count + 1, [acc, ",", line]}
  end)

IO.puts(["{\"count\":", to_string(count), ",\"data\":[", entries, "]}"])

Remember that an iolist is defined as a list which contains a string, a byte or another iolist. This is why our count must be converted to a string via to_string(count). Without this, 4 would be treated as ascii character EOT, which is not what we want. If our integer was negative or greater than 255, we'd get an error:

:erlang.iolist_to_binary(["a", 256])
** (ArgumentError) errors were found at the given arguments:

  * 1st argument: not an iodata term

    :erlang.iolist_to_binary(["a", 256])

Also note that we start our list with the first entry. This is a trick that works in other languages too. It allows us to prepend our separator (a comma in this case) in each iteration. Commonly, this is implementing by conditionally appending the comma for all but the last element (requiring a if inside the loop). To be completely safe, we should probably guard against an empty input though.

Finally, you'll often see the word iodata instead of or in addition to iolist. In fact, we already saw Jason's encode_to_iodata/1 function. The two are often used interchangeably, but they are different. Iodata is either an iolist or a string. [] and ["it", " ", ["o",["v", "e", "r"]]] are both an iolist and iodata, "it's over 9000" is only an iodata. But, like I said, they're used quite interchangeable. Even Erlang's :iolist_to_binary/1 accepts both.