home

Elixir, A Little Beyond The Basics - Part 4: structures

Oct 09, 2021

In languages with mutable data, structures tend to be arranged in a contiguous block of memory. A structure's fields act a bit like an array index: i.e. an offset from the start of the structure. Say we're given the following structure:

// Go
type User struct {
  id int32
  active bool
  name string
}

The id is found at offset 0, active is found at offset 4, and name is found at offset 5. This assumes that and int32 takes 4 bytes, and a bool takes 1 byte (which, for various reasons, might not always be the case).

The following code creates a user, gets the underlying bytes and changes our 5th byte (which is where we expect the data for active to be found), from 1 to 0.

// Go
var SIZE_OF_USER = unsafe.Sizeof(User{})

func main() {
  u1 := User{
    id:     10,
    active: true,
    name:   "Goku",
  }
  fmt.Println(u1)

  data := (*(*[1<<31 - 1]byte)(unsafe.Pointer(&u1)))[:SIZE_OF_USER]
  fmt.Println(data)
  data[4] = 0
  fmt.Println(u1)
}

The output shows that we changed the value of the active field:

{10 true Goku}
[10 0 0 0 1 0 0 0 140 72 73 0 0 0 0 0 4 0 0 0 0 0 0 0]
{10 false Goku}

The point that I'm trying to make is that, in some languages, manipulating structures is very efficient. However, such efficiency requires mutability. As we saw in part 1 with lists, and as we'll see in the next part with tuples, for the above implementation to work with immutable structures, we'd have to clone the data before being able to change it. This would drastically degrade write performance.

This is why, as you probably already know, Elixir structures are implemented as maps. Maps can be efficiently read from and written to, are immutable and the keys can be any term, such as atoms which make good field names. While the map implementation is amazingly efficient given that we get immutable data, we must acknowledge that it's not as fast as what you'll get from most mutable structures, such as Go's implementation shown above.

Technically, an structure in elixir is simply a map with the special :__struct__ key and a value of the module containing the structure definition:

defmodule MyApp.User do
  defstruct [:id, :active, :name]
end

u = %{__struct__: MyApp.User, id: 10, active: true, name: "Goku"}
case u do
  %MyApp.User{} -> IO.puts("Yes")
   _ -> IO.puts("No")
end

The above outputs Yes because a structure is really just a map with the __struct__ key.

Now, that doesn't mean that structures aren't without value. But it does mean that, if you want to, you can often treat a struct just like a map. In the above code, we could add any other key to u, either when we create the structure or later via Map.put/3 and it would still be a MyApp.User structure (by that I mean that it will still pattern match to %MyApp.User).

So what values do structures actually provide? For the most part, they proivde some compile-time error checking. It's true that we can do:

u = %{__struct__: MyApp.User, id: 10, active: true, name: "Goku", hack: true}

But if we try to do:

u = %MyApp.User{id: 10, active: true, name: "Goku", hack: true}

We'll get a compile-time error. Similar errors happen if we try to pattern match with an unknown field, e.g. %MyApp.User{hack: true}. Finally, you might have seen the special map update syntax before:

data = %{a: 0, b: 1, c: 3}
data = %{data | a: 1, b: 2}

This will only update data if it already contains the keys being updated. In this case, we're updating the keys a and b which are existing keys, so everything will be ok. If we tried to do either of the following, we're get a KeyError:

%{data | z: 9}

# OR

%{data | a: 1, b: 2, z: 9}

But these errors happen at runtime. If we do this with a structure, we'll get compile-time error:

u = %MyApp.User{id: 10, active: true, name: "Goku"}

# this works ok, active is a valid key
u = %MyApp.User{u | active: false}

# this will give a compile time error:
u = %MyApp.User{u | z: 9}

Finally, one significant difference between structs and maps is that structs do not inherit the protocols or behaviours which exist for maps. So while our MyApp.User structure is implemented as a map, and we kind of treat it like a map, it still represents a MyApp.User. If we want our structure to implement a Size protocol, it's up to us to define it: it would not make sense for Elixir to assume that the size of a MyApp.User is the number of fields.