Elixir, A Little Beyond The Basics - Part 6: processes

Oct 20, 2021

One of the things I like the most about Elixir is that complexity in the language, standard library and runtime are well layered. By this I mean that it's possible to get up to speed and be productive quickly while slowly and naturally uncovering more advanced and nuanced concepts.

I can think of no better example of this than processes. Processes are a fundamental part of programming in Elixir, yet it's common for developers to build full Phoenix applications without interacting with them directly. Then one day, a need arises, which, with some online searching, leads to examples of Agents and GenServers and a whole new world opens up.

One of the reasons this is such a smooth experience is the synergy between the languages, standard library and runtime: GenServers, as a whole, have nice ergonomics despite their relative complexity. But this same ease-of-use can result in a superficial understanding of what's happening behind the scenes.

Let's start from zero. Elixir's terminology (inherited from Erlang) is initially confusing, but, as you learn more, accurate. Elixir doesn't have "threads", it has "processes". When you're just starting out though, it's safe to think of them as any other "green thread" (or "virtual thread"), such as goroutines. This means that they're managed by the Erlang runtime, are cheap to create and have low overhead.

We start a process with the spawn function:

spawn fn -> IO.puts("over 9000!") end

It's worth pointing out that everything is running in a process. If you run the above code in an iex terminal, that's a process (which, like any other code, can spawn more processes).

Where Elixir processes differ from most thread and green thread implementations is that they're isolated. They don't share memory with other processes, including the process that spawned them. This has implications on how processes interact with each other, as typical concurrency primitives (e.g. mutexes) cannot work. Specifically, all interactions between processes are done via message passing. Let's look at an example:

pid = spawn fn ->
  receive do
    msg -> IO.puts("received: #{msg}")
  end
end
send(pid, "hello")

Here we see three things. First, spawn returns the process identifier (PID or pid). We use send to send a message to a pid. Above we're sending a string, but we can send any term (i.e. anything). Finally, receive is used to read messages.

By default, receive will block until a message is received. We can change this behavior, as well as add pattern matching to the received message. We can also receive multiple messages by calling receive multiple times:

defmodule MyApp.Receiver do
  def run(counter) do
    IO.puts(counter)

    receive do
      :incr -> run(counter + 1)
      {:incr, by} -> run(counter + by)
      :stop -> :ok
    after
      250 -> run(counter) # executed after 250 milliseconds
    end

  end
end

pid = spawn fn -> MyApp.Receiver.run(0) end
:timer.sleep(1000)

send(pid, :incr)
:timer.sleep(2000)

send(pid, {:incr, 10})
:timer.sleep(1000)

send(pid, :stop)

You might get different results each time you run this, but it should be similar to 0 printed four times, followed by 1 printed eight times and finally 11 printed four times.

Thread Safety

What you must understand is that each Elixir processes has its own mailbox. Messages sent to a process are placed at the end of the mailbox. When a process reads from its mailbox, using receive, the message is removed from the front it its mailbox (it's a queue). Multiple processes can send messages to the same target process, but that target process can only process one message at a time.

When you combine this mailbox pattern with process isolation, you get a strong guarantee: a process can always manipulate its data without needing any concurrency control. (To be clear, there is synchronization within the runtime to allow concurrent access to the mailbox, but this is completely hidden from the application).

You might be thinking to yourself that our above example is too simple. Of course, counter can't be shared between processes, it's just an integer that exists on run's stack. But the reality is that processes always have exclusive access to their data (which we typically call their "state"), regardless of the type. This is because message which cross process boundaries are deep copied (strings larger than 64 bytes have special optimizations where only a reference is passed).

This is less efficient than passing references. But it has two significant advantages. First, as we already mentioned, processes are thread-safe without any additional concurrency control. Second, it allows the garbage collector to be more efficient: since data is isolated per process, garbage collection is also isolated per process.

Send & Receive

While receive will block, send never blocks. In fact, we can send to a non-existent process:

pid = spawn fn -> end

:timer.sleep(100)
Process.alive?(pid) # false, the process has exited

send(pid, :hello)

If you've used GenServers, you know that you can interact with them asynchronously (via cast/2) and synchronously (via call/2). How is that possible? Our sender can receive to wait for a reply:

pid = spawn fn ->
  receive do
    {:add, a, b, reply_pid} -> send(reply_pid, {:sum, a + b})
  end
end

send(pid, {:add, 9000, 1, self()})
receive do
  {:sum, value} -> IO.puts(value)
after
  5000 -> raise :timeout
end

self/0 returns the current PID, which we need to send to our calculator in order for it to know where to direct the reply.

In addition to send/2, there's also Process.send_after/3 which can be used to send a message, to a pid, after a certain amount of time:

Process.send_after(pid, :refresh_token, :timer.minutes(1))

receive also has one neat trick: it'll search the mailbox for messages that match the given pattern(s):

pid = spawn fn ->
  receive do
    {:add, a, b} when is_number(a) and is_number(b) -> IO.puts("#{a} + #{b} == #{a + b}")
    {:sub, a, b} when is_number(a) and is_number(b) -> IO.puts("#{a} - #{b} == #{a - b}")
  end
end

send(pid, {:multiply, 9000, 2})
send(pid, {:add, "hello", "world"})
send(pid, {:sub, 1000, 5})

The above will only output 1000 - 5 == 995, as the first two message will be ignored. In the above example, our spawned process exits shortly after we send the 3rd message, as this unblocks the receiver and causes the function to exit. However, if our process was long-lived, it's important to know that the first two messages we sent continue to exist in the process' mailbox (the runtime doesn't know if some later call to receive will want those messages.). We can see this in action:

pid = spawn fn ->
  receive do
    {:add, a, b} when is_number(a) and is_number(b) -> IO.puts("#{a} + #{b} == #{a + b}")
    {:sub, a, b} when is_number(a) and is_number(b) -> IO.puts("#{a} - #{b} == #{a - b}")
  end

  receive do
    msg -> IO.inspect("unknown1: #{inspect(msg)}")
  end
  receive do
    msg -> IO.inspect("unknown2: #{inspect(msg)}")
  end

  # this will block, since we've now emptied our mailbox
  receive do
    _ -> raise "should not be called"
  end
end

send(pid, {:multiply, 9000, 2})
send(pid, {:add, "hello", "world"})
send(pid, {:sub, 1000, 5})

Which outputs

1000 - 5 == 995
"unknown1: {:multiply, 9000, 2}"
"unknown2: {:add, \"hello\", \"world\"}"

Normally, when we call receive, we'll get messages in the order that they were written into the process' mailbox. But we can see from the above, where the first message received is actually the last sent, that a selective receive, via pattern matching, adds another layer.

Process Names, Process Info and Process Dictionaries

We can give a process a name, and use the name rather than the pid when sending a message. This makes it possible to send messages to process without knowing their current pid.

pid = spawn fn ->
  receive do
    {:add, a, b, reply_pid} -> send(reply_pid, {:sum, a + b})
  end
end

Process.register(pid, :myapp_calculator)
send(:myapp_calculator, {:add, 9000, 2, self()})
receive do
  msg -> IO.inspect(msg)
end

Process.whereis/1 can be used to get the pid for a given name. It returns nil if no process is registered with that name. Process.registered/0 can be used to get a list of all registered process names.

Process.info/1 can be used to get information about a process. It takes a pid, not a registered name. Similarly, Process.info/2 takes a list of the specific fields we're interested in. Twenty or so fields are exposed by Process.info/0:

> Process.info(self())
[
  current_function: {Process, :info, 1},
  initial_call: {:proc_lib, :init_p, 5},
  status: :running,
  message_queue_len: 0,
  links: [],
  dictionary: [],
  trap_exit: false,
  error_handler: :error_handler,
  priority: :normal,
  group_leader: #PID<0.66.0>,
  total_heap_size: 13544,
  heap_size: 2586,
  stack_size: 51,
  reductions: 87142,
  garbage_collection: [
    max_heap_size: %{error_logger: true, kill: true, size: 0},
    min_bin_vheap_size: 46422,
    min_heap_size: 233,
    fullsweep_after: 65535,
    minor_gcs: 6
  ],
  suspending: []
]

> Process.info(self(), [:status, :message_queue_len, :reductions])
[status: :running, message_queue_len: 0, reductions: 140543]

message_queue_len and reductions are particularly useful if you're interested in monitoring the performance and efficiency of your processes. The first indicates the size of the mailbox, or how many message are waiting to be received by the process. In most cases, you'll want message_queue_len to spend as much time at or near 0 as possible.reductions can be viewed as an opaque unit of work. A processes reductions will continue to increment, so what we care about is the rate. In reality, a reductions is a counter which is incremented on a function call. In current versions of the Erlang runtime, a context switch happens at every 4000 reductions). While the absolute count of reductions is less valuable than the rate (else long-running processes will always appear as "heavier"), we can quickly get a list of the processes with the highest reductions:

Process.list() # gets a list of all running pids
|> Enum.map(fn pid ->
  info = Process.info(pid, [:reductions, :registered_name])

  # not every process has a name, if it doesn't, fallback to the pid
  name = case info[:registered_name] do
    [] -> pid
    name -> name
  end

  %{name: name, reductions: info[:reductions]}
end)
|> Enum.sort_by(fn p -> p[:reductions] end, :desc)
|> Enum.take(10)

There's another field in the process information that's interesting: dictionary. In the above sample output, it was just an empty list. However, if you run Process.info(self()) in an iex terminal, you'll get a non-empty list (try running it more than once!).

Every process has an internal dictionary which can be used to store arbitrary values. The process dictionary is only accessible within the process. You interact with it via Process.get/1, Process.put/2 and Process.delete/1:

defmodule MyApp.Process do
  def run() do
    Process.put(:myapp_history, [])
    read_loop(0)
  end

  defp read_loop(3) do
    IO.inspect(history())
    read_loop(0)
  end

  defp read_loop(count) do
    receive do
      msg -> Process.put(:myapp_history, [msg | history()])
    end
    read_loop(count + 1)
  end

  defp history(), do: Process.get(:myapp_history)
end

pid = spawn &MyApp.Process.run/0

send(pid, 1)
send(pid, 2)
send(pid, 3)
IO.inspect(Process.get(:myapp_history))

Similar to iex, we use the process dictionary to keep a history. We print out the history on every 3rd message. Importantly, because every process has its own dictionary, the last line of the above snippet will print nil. Since it's being executed form a different process, it doesn't matter that we're using the same key, namely :myapp_history.

While process dictionaries can be useful in some cases, do note that they can make code hard to understand and maintain. Values stored in the process dictionary as essentially globals of the process - with the advantage that, being fully owned and isolated to the process, they are, of course, thread safe.

Processes and Modules

This might be obvious, but it's important to be clear: processes and modules aren't tied. Given the following:

defmodule MyApp.Calculator do
  def start() do
    pid = spawn &loop/0
    Process.register(pid, :calculator)
  end

  defp loop() do
    receive do
      {:add, a, b, reply_pid} -> send(reply_pid, a + b)
    end
    loop()
  end

  def add(a, b) do                              # 1
    send(:calculator, {:add, a, b, self()})     # 1
    receive do                                  # 1
      sum -> sum                                # 1
    after                                       # 1
      1000 -> raise :timeout                    # 1
    end                                         # 1
  end                                           # 1
end

MyApp.Calculator.start()                        # 1
IO.puts MyApp.Calculator.add(1, 3)              # 1

It's important to know what's being run from what process. Everything commented with # 1 is running in our initial process. This includes the add/2 which is located in the MyApp.Calculator module. While GenServers (to be discussed in greater detail in a later part) often give the impression that modules and process map 1 to 1, this isn't the case.

tl;dr

Every Elixir process has a mailbox and all communication between processes happens via messages sent to and received from mailboxes. Because a process can only operate on one message at a time, and because processes are isolated (do not share any of their data/state), processes do not require any application-level concurrency control.