home

Reading a JSON config in Zig

Mar 06, 2023

This post was updated on May 18 2023, to reflect changes to std.json currently on the master branch. The original version of the code is commented out.

A while ago, I wrote a Websocket Server implementation in Zig. I did it just for fun, which means I didn't have to worry about the boilerplate things that go into creating an actual production application.

More recently, I've jumped back into Zig for something a little more serious (but not really). As part of my learning journey, I wanted to add more polish. One of the things I wanted to do was add configuration options, which meant reading a config.json file.

This seemingly simple task presented me with enough challenges that I thought it might be worth a blog post. The difficulties that I ran into are threefold. First, Zig's documentation is, generously, experimental. Second, Zig changes enough from version to version that whatever external documentation I did find, didn't work as-is (I'm writing this using 0.11-dev). Finally, decades of being coddled by garbage collectors have dulled by education.

Here's the skeleton that we'll start with:

const std = @import("std");

pub fn main() !void {
  const config = try readConfig("config.json");
  std.debug.print("config.root: {s}\n", .{config.root});
}

fn readConfig(path: []const u8) !Config {
  //TODO: all our logic here!

  // since we're not using path yet, we need this to satisfy the compiler
  _ = path;
  return error.NotImplemented;
}

const Config = struct {
  root: []const u8,
};

It would be a nice addition to be able to specify the path/filename of the configuration via a command line option, such as --config config.json, but Zig has no built-in flag parser, so we'd have fiddle around with std.os.argv or use a third party library.

Also, as a very quick summary of Zig's error handling, the try function(...) syntax is like Go's if err != nil { return err }. Note that our main function returns !void and our readConfig returns !Config. This is called an error union type and it means our functions return either an error or the specified type. We can list an explicit error set (like an enum), in the form of ErrorType!Type. If we want to support any error, we can use the special anyerror type. Finally, we can have zig infer the error set via !Type which, practically speaking, is similar to using anyerror, but is actually just a shorthand form of having an explicit error set (where the error set is automatically inferred at compile-time).

Reading a File

The first thing we need to do is read the contents of our file (after which we can parse it). We could explore the available API, but it should be somewhat obvious that this is going to require allocating memory. One of Zig's key feature is explicit memory allocations. This means that any part of the standard library (and hopefully 3rd party libraries) that might allocate memory take a parameter of type std.mem.Allocator. In other words, instead of having a readFile(path) ![]u8 function which would allocate memory internally using, say, malloc, we'd expect to find a readFile(allocator, path) ![]u8 which would use the supplied allocator.

Hopefully this will make more sense once we look at a concrete example. For now, we'll create an allocator in main and pass it to readConfig:

const std = @import("std");
const Allocator = std.mem.Allocator;

pub fn main() !void {
  // we have a few choices, but this is a safe default
  var gpa = std.heap.GeneralPurposeAllocator(.{}){};
  const allocator = gpa.allocator();

  const config = try readConfig(allocator, "config.json");
  std.debug.print("config.root: {s}\n", .{config.root});
}

fn readConfig(allocator: Allocator, path: []const u8) !Config {
  //TODO: all our logic here!

  // since we're not using path yet, we need this to satisfy the compiler
  _ = path;
  _ = allocator;
  return error.NotImplemented;
}

Now to actually read the file, there's a Dir type in the standard library that exposes a readFile and a readFileAlloc method. I find this API a little confusing (though I'm sure there's a reason for it), as these aren't static functions but members of the Dir type. So we need a Dir. Thankfully, we can easily get the Dir of the current working directory using std.fs.cwd().

If we use readFileAlloc, our code looks like:

fn readConfig(allocator: Allocator, path: []const u8) !Config {
  const data = try std.fs.cwd().readFileAlloc(allocator, path, 512);
  defer allocator.free(data);
  ...
}

The last parameter, 512, is the maximum size to allocate/read. Our config is very small, so we're limiting this to 512 bytes. Importantly, this function allocates and returns memory using our provided allocator. We're responsible for freeing this memory, which we'll do when the function returns, using defer.

Alternatively, we could use readFile, which takes and fills in a []u8 instead of an allocator. Using this function, our code looks like:

fn readConfig(allocator: Allocator, path: []const u8) !Config {
  var buffer = try allocator.alloc(u8, 512);
  const data = try std.fs.cwd().readFile(path, buffer);
  defer allocator.free(buffer);
  ...

In the above code, data is a slice of buffer which is fitted to the size of file. In our specific case, it doesn't matter if you use readFileAlloc or readFile. I find the first one simpler. The benefit of the readFile though is the ability to re-use buffers.

It turns out that reading a file is straightforward. However, if you're coming from a garbage collected mindset, it's hopefully apparent that [manual] memory management is a significant responsibility. If we did forget to free data, and we did write tests for readConfig, Zig's test runner would detect this and automatically fail. The output would look something like:

zig build test
Test [3/10] test.readConfig... [gpa] (err): memory address 0x1044f8000 leaked:
/opt/zig/lib/std/array_list.zig:391:67: 0x10436fa5f in ensureTotalCapacityPrecise (test)
    const new_memory = try self.allocator.alignedAlloc(T, alignment, new_capacity);
                                                                  ^
/opt/zig/lib/std/array_list.zig:367:51: 0x104366c2f in ensureTotalCapacity (test)
    return self.ensureTotalCapacityPrecise(better_capacity);
                                                  ^
/opt/zig/lib/std/io/reader.zig:74:56: 0x104366607 in readAllArrayListAligned__anon_3814 (test)
    try array_list.ensureTotalCapacity(math.min(max_append_size, 4096));
                                                       ^
/opt/zig/lib/std/fs/file.zig:959:46: 0x10436637b in readToEndAllocOptions__anon_3649 (test)
    self.reader().readAllArrayListAligned(alignment, &array_list, max_bytes) catch |err| switch (err) {
                                             ^
/opt/zig/lib/std/fs.zig:2058:42: 0x104365d0f in readFileAllocOptions__anon_3466 (test)
    return file.readToEndAllocOptions(allocator, max_bytes, stat_size, alignment, optional_sentinel);
                                         ^
/opt/zig/lib/std/fs.zig:2033:41: 0x104365a0f in readFileAlloc (test)
    return self.readFileAllocOptions(allocator, file_path, max_bytes, null, @alignOf(u8), null);
                                        ^
/code/demo.zig:13:41: 0x10436745f in readConfig (test)
    const data = try std.fs.cwd().readFileAlloc(allocator, path, 512);
                                        ^
/code/demo.zig:30:13: 0x104369547 in test.readConfig (test)
    const config = try readConfig(allocator, "config.json");

Parsing JSON

We've read our file and have a string, which in Zig is a []u8. Populating a Config struct is just two more lines:

fn readConfig(allocator: Allocator, path: []const u8) !Config {
  const data = try std.fs.cwd().readFileAlloc(allocator, path, 512);
  defer allocator.free(data);

  // Use the following two line if you're using a version of Zig prior to
  // 0.11.0-dev.3187+40e8c2243 (Mid May 2023)
  // var stream = std.json.TokenStream.init(data);
  // return try std.json.parse(Config, &stream, .{.allocator = allocator});

  // This is the new Zig 0.11 API (available on master as of Mid May 2023)
  // the last arguments are parse options, here we're using the defaults
  return try std.json.parseFromSlice(Config, allocator, data, .{});
}

This works, except....memory allocations again. Because our Config structure has a string field, we know that std.json.parseFromSlice is going to have to allocate memory. That's why we pass our allocator.

Our code works, but it's leaking memory: config.root needs to be freed! Now in this case, that's pedantic, since we're only creating a single Config and it'll probably exist from start to end. But, for completeness, we can modify our main:

pub fn main() !void {
  // we have a few choices, but this is a safe default
  var gpa = std.heap.GeneralPurposeAllocator(.{}){};
  const allocator = gpa.allocator();

  const config = try readConfig(allocator, "config.json");
  defer allocator.free(config.root);
  ...
}

If we had more fields, we might want to create a deinit function in Config to better encapsulate the logic. Our main would be much cleaner if it could just call defer config.deinit(allocator);.

The json package has a parseFree function which can free any memory that parse allocated. I'm sure that's useful in complex cases, but in our simple case, I'm not really a fan. If you wanted to, you'd use it as:

std.json.parseFree(Config, allocator, config);

The allocator we pass to parseFree has to be the same passed to parseFromSlice.

Default Values

With the above code, json.parse will fail with an error.MissingField if our json file doesn't have a root field. But what if we wanted to make it optional with a default value? There are two options. The first is to specify the default value in our structure:

const Config = struct {
  root: []const u8 = "/tmp/demo",
};

This will work, but there's one major issue: we can no longer safely free config.root. If the root comes from json, then it's allocated with our allocator and can/must be freed with our allocator. But if it's a default, it's global data and trying to free it with our allocator will crash. In this case, using the previously mentioned std.json.parseFree is a viable solution as it will only free the memory that parse allocated.

Another option is to use an Optional type and default to null:

const Config = struct {
  root: ?[]const u8 = null,
};

When we want to use the value, we can use orelse to set the default:

lmdb.open(config.root orelse "/tmp/db")

And we can conditionally free:

if (config.root) |root| {
  allocator.free(root);
}

Arena Allocator

One option we could consider is an ArenaAllocator. An ArenaAllocator wraps another allocator (like the GeneralPurposeAllocator we've been using), and makes it so we free the entire arena in one shot instead of each individually allocated data. It's suitable for a number of cases, such as using an arena per http request. In our case, since we're talking about data that lives from the start of the program until the end, it probably not that useful. But, let's look at it anyways. We just need a slight change to our main:

pub fn main() !void {
  var gpa = std.heap.GeneralPurposeAllocator(.{}){};
  var arena = std.heap.ArenaAllocator.init(gpa.allocator());
  defer arena.deinit();
  const allocator = arena.allocator();

  ...

We still have an allocator of type std.mem.Allocator, so we can still pass this to readConfig and readFileAlloc and json.parse, but we no longer need to free the memory as it'll all be freed when we call arena.deinit().

However, we should still keep all of our existing code as-is. In other words, in readConfig we should still call: defer allocator.free(data);. Why? Because readConfig only knows that it received a std.mem.Allocator, it doesn't know that it's an ArenaAllocator. Plus, in the future, things could change. Calling free on an ArenaAllocator is a noop. So we definitely want to leave it in.

As a final tweak that I'll just mention, rather than having our ArenaAllocator wrap our GeneralPurposeAllocator, we could use a FixedBufferAllocator. A FixedBufferAllocator is initiated with []u8 and cannot grow beyond that. Thus it's both efficient and provides some protection against misuse.

Conclusion

The complete code to read a json config file into a config structure is:

const std = @import("std");
const Allocator = std.mem.Allocator;

pub fn main() !void {
  var gpa = std.heap.GeneralPurposeAllocator(.{}){};
  const allocator = gpa.allocator();

  const config = try readConfig(allocator, "config.json");
  std.debug.print("config.root: {s}\n", .{config.root});
}

fn readConfig(allocator: Allocator, path: []const u8) !Config {
  // 512 is the maximum size to read, if your config is larger
  // you should make this bigger.
  const data = try std.fs.cwd().readFileAlloc(allocator, path, 512);
  defer allocator.free(data);
  return try std.json.parseFromSlice(Config, allocator, data, .{});
}

const Config = struct {
  root: []const u8,
};

It would be nice to parse our JSON in a single line, and I don't get why readFileAlloc is a member of Dir (but I bet there's a reason). Overall though, it's pretty painless. Hopefully this helps someone :)