Elixir, A Little Beyond The Basics - Part 8: genservers

Oct 23, 2021

GenServers take what we've already learnt about processes and wrap it in a developer-friendly interface. To master GenServers, you must simply understand the fundamentals of processes, namely the mailbox and how processes interact with it. Still, looking at these fundamentals from the perspective of GenServers is a useful exercise.

The first thing that stands about GenServers is that we don't generally interact with the mailbox directly. Instead, to send a message to a GenServer we use GenServer.cast/2 or GenServer.call/2 and to receive, we implement the handle_cast/2, handle_call/3 and, sometimes, handle_info/2. Here's a basic counter in action:

defmodule MyApp.Counter do
  use GenServer
  @name __MODULE__

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: @name)
  end

  def increment(n) do
    GenServer.call(@name, {:increment, n})
  end

  def init(_opts), do: {:ok, 0}

  def handle_call({:increment, n}, _from, state) do
    state = state + n
    {:reply, state, state}
  end
end

GenServers are usually started by Supervisors, which will call the GenServer's start_link/0 or start_link/1 function. The process will be spawned, and init/1 will be called, this is where the initiate state of our process is set (the initial state of our counter is 0). Internally, the GenServer will receive messages and call handle_cast/2, handle_call/3 or handle_info/2 depending on the received message. In pure-process terms, you can think of the last part as:

defp receive_loop(state) do
  new_state =
    receive do
      msg -> handle_call(msg, state)
    end
  receive_loop(new_state)
end

We can better understand the interaction between the sender and the GenServer by looking at the different messages that send/2, GenServer.cast/2 and GenServer.call/2 generate (if you're surprised that we can use call/2 and cast/2 on a plain process id, remember, GenServers are just processes!):

pid = spawn fn ->
  receive do msg -> IO.inspect(msg) end
  receive do msg -> IO.inspect(msg) end
  receive do msg -> IO.inspect(msg) end
end
send(pid, :hello)
GenServer.cast(pid, :hello)
GenServer.call(pid, :hello)

The above will output something like:

# send
:hello

 # cast
{:"$gen_cast", :hello}

# call
{ :"$gen_call",
  {#PID<0.110.0>, [:alias | #Reference<0.426212949.4092919813.244647>]},
  :hello
}

# what happened here?!
** (exit) exited in: GenServer.call(#PID<0.116.0>, :hello, 5000)
    ** (EXIT) normal
    (elixir 1.12.0) lib/gen_server.ex:1024: GenServer.call/3

If you forget about the error at the end, you can hopefully start to see how GenServers receive messages and route them to the correct handle_XYZ function. Something like:

defp receive_loop(state) do
  new_state =
    receive do
      {:"$gen_cast", msg} -> handle_cast(msg, state)
      { :"$gen_call", {_caller_pid, [:alias, ref]}, msg} -> handle_call(msg, ref, state)
      msg -> handle_info(msg, state)
    end
  receive_loop(new_state)
end

What about that error when we used GenServer.call/2? We saw in part 6 that, for synchronous messages to work, we need to send an asynchronous message and then block waiting for an asynchronous message (as the reply). For this to work reliably, the caller monitors the process. In our above code, the spawned function exits after receiving the third message. This is detected by the caller (thanks to the monitor that GenServer.call/2 set up) which then throws an exception. What if our process didn't exit after receiving the message?

pid = spawn fn ->
  receive do msg -> IO.inspect(msg) end
  :timer.sleep(:timer.minutes(1))
end
GenServer.call(pid, :hello)

If you run the above and wait 5 seconds, you'll get a similar error:

** (exit) exited in: GenServer.call(#PID<0.115.0>, :hello, 5000)
    ** (EXIT) time out
    (elixir 1.12.0) lib/gen_server.ex:1024: GenServer.call/3

Not only does our caller monitor the target process, it also sets a timeout (defaulting to 5 seconds) to receive the reply.

Phantom Reads

The timeout behavior of GenServer.call/2 represents one of the most dangerous and leaky aspects of GenServers (and processes in general): mailboxes can accumulate unexpected messages. In its simplest form, you can just spawn a process which does some arbitrary (non-mailbox related) work, and send a messages to it:

pid = spawn fn ->
  # do some work
end

send(pid, "i am")
send(pid, :flooding)
send(pid, "your mailbox")
send(pid, 5555)

When a process dies, its mailbox goes with it, so it's not a problem. But for long running processes, you have to be careful. You can accumulate an ever-growing list of message (we previously saw that monitoring the message_queue_len of processes is a good idea!). It's particularly problematic when calling a GenServer and getting a timeout. Here's a naive, but representative, version of what happens:

pid = spawn fn ->
  receive do
    {:add, pid, a, b} ->
      :timer.sleep(1000)
      send(pid, {:sum, a+b})
  end
end

send(pid, {:add, self(), 8, 11})
receive do
  {:sum, n} -> IO.puts("sum: #{n}")
after
  100 -> IO.puts("timeout")
end

# what happens if we receive again here, without the timeout?

The timeout is purely a behaviour of the caller. The spawned process knows nothing about the timeout, it gets the message and goes about processing it, eventually sending a reply which goes to the caller's mailbox. But the caller has moved on. If it uses receive in the future, it'll get the reply it decided had timed out.

With GenServers, if call/2 fails, the caller will crash (because it throws an error). In our above snippet, replace the IO.puts("timeout") with throw :timeout and you get a sense for what the GenServer will do. Messages sent to our now dead process are just discarded. But sometimes, you'll be tempted to put a try/catch around a GenServer.call/2, maybe specifically so that it doesn't crash on a timeout. In such cases, the caller's mailbox can get pretty messy. So only prevent the caller from crashing when you're sure of what you're doing..

Blocking Init

When we looked at supervisors, we mentioned that each child of the supervisor is initialized sequentially. This creates predictability and the ability for processes to depend on each other. But the downside is that if process A is slow to start, or crashes, the entire chain is blocked.

The rule is: don't do anything slow or risky in your GenServer's init function. But that isn't always practical. GenServers have a reasonably elegant solution to this: handle_continue/2. We can change our init function to return {:ok, INTIAL_STATE, {:continue, CONTINUE_TYPE}} which will both unblock the initialization and guarantee that handle_continue/2 is called before any other message is processed.

defmodule MyApp.Counter do
  use GenServer
  @name __MODULE__

  def init(_opts), do: {:ok, nil, {:continue, :load}}

  def handle_continue(:load, initial_state) do
    # new_state = maybe load a saved state from the db?
    {:noreply, new_state}
  end
end

The value we return in {:continue, VALUE} is the value that's passed to handle_continue/2 (:load in the above), allowing more complicated cases where we might have different handle_continue implementations

send / send_after

While we usually interact with GenServer's using call/2 and cast/2, you can use send/2 and Process.send_after/3 as well. These messages get handled by handle_info/2. This is mostly useful with send_after/3 within your own code. But you'll also run into libraries that use send/2 since they don't want to assume that you're using a GenServer versus a plain process. The built-in :gen_tcp library comes to mind.

Concurrency

We covered process concurrency at length when talking about processes, in part 6, but let's see what this means from a GenServer's perspective.

defmodule MyApp.Counter do
  use GenServer

  # ...
  # start_link, init
  # ...

  def get() do
    GenServer.call(@name, :get)
  end

  def increment(n) do
    GenServer.cast({:increment, n})
  end

  def handle_call(:get, _from, state) do
    {:reply, state, state}
  end

  def handle_cast({:increment, n}, state) do
    {:noreply, state + n}
  end
end

Now let's pretend that we have a Phoenix web app which is using this GenServer. Of course, Phoenix can process requests concurrently. Is it a problem if 2 (or 3 or 1000) web requests access this GenServer "at the same time". Did you notice that one of these is synchronous (call/2) and one is asynchronous (cast/2)? Does that change anything?

Absolutely not, and not only because our behavior is simple. A process, and by extension a GenServer, processes a single message at a time. In non-Elixir terms, the state, the value of our counter, is only ever read from or written to by a single thread. No exceptions.

You will note that handle_call/2 and handle_cast/2 are public, so we could call these directly:

MyApp.Counter.handle_cast({:increment, n}, 10)

This is just a normal function call and will execute within the process of the caller, not the GenServer. state is just the second argument that we pass, 10. Calling it like this doesn't expose the state of the MyApp.Counter GenServer. The function returns {:noreply, 20}, but that's just a return value, it doesn't alter the state of the MyApp.Counter GenServer.

Is it useful to call a handle_call/cast/info function directly like this? I think it can make sense in unit tests which are concerned with specific behavior of the function. This can focus test on specific behavior and significantly minimize the work needed to set everything up. A pretty big testing win in my book.