openmymind.net

Is Zig's New Writer Unsafe?

2025-09-20T00:00:00Z

If we wanted to write a function that takes one of Zig's new *std.Io.Reader and write it to stdout, we might start with something like:


fn output(r: *std.Io.Reader) !void {
    const stdout = std.fs.File.stdout();
    var buffer: [???]u8 = undefined;
    var writer = stdout.writer(&buffer);
    _ = try r.stream(&writer.interface, .unlimited);
    try writer.interface.flush();
}

But what should the size of buffer be? If this was a one-and-done, maybe we'd leave it empty or put some seemingly sensible default, like 1K or 4K. If it was a mission critical piece of code, maybe we'd benchmark it or make it platform dependent.

But unless I'm missing something, whatever size we use, this function's behavior is undefined. You see, the issue is that readers can require a specific buffer sizes on a writer (and writers can require a specific buffer size on a reader). For example, this code, with a small buffer of 64, fails an assertion in debug mode, and falls into an endless loop in release mode:


const std = @import("std");

pub fn main() !void {
    var fixed = std.Io.Reader.fixed(&.{
        40, 181, 47, 253, 36, 110, 149, 0, 0, 88, 111, 118, 101, 114, 32, 57,
        48, 48, 48, 33, 10, 1, 0, 192, 105, 241, 2, 170, 69, 248, 150
    });

    var decompressor = std.compress.zstd.Decompress.init(&fixed, &.{}, .{});
    try output(&decompressor.reader);
}

fn output(r: *std.Io.Reader) !void {
    const stdout = std.fs.File.stdout();
    var buffer: [64]u8 = undefined;
    var writer = stdout.writer(&buffer);
    _ = try r.stream(&writer.interface, .unlimited);
    try writer.interface.flush();
}

Some might argue that this is a documentation challenge. It's true that the documentation for zstd.Decompress mentions what a Writer's buffer must be. But this is not a documentation problem. There are legitimate scenarios where the nature of a Reader is unknown (or, at least, difficult to figure out). A type of a reader could be conditional, say based on an HTTP response header. A library developer might take a Reader as an input and present their own Reader as an output - what buffer requirement should they document?

Worse is that the failure can be conditional on the input. For example, if we change our source to:


var fixed = std.Io.Reader.fixed(&.{
    40, 181, 47, 253, 36, 11, 89, 0, 0, 111, 118, 101, 114, 32, 57,
    48, 48, 48, 33, 10, 112, 149, 178, 212,
});

Everything works, making this misconfiguration particularly hard to catch early.

To me this seems almost impossible - like, I must be doing something wrong. And if I am, I'm sorry. But, if I'm not, this is a problem right?

Leave a comment

Everything is a []u8

2025-09-07T00:00:00Z

If you're coming to Zig from a more hand-holding language, one of the things worth exploring is the relationship between the compiler and memory. I think code is the best way to do that, but briefly put into words: the memory that your program uses is all just bytes; it is only the compile-time information (the type system) that gives meaning to and dictates how that memory is used and interpreted. This is meaningful in Zig and other similar languages because developers are allowed to override how the compiler interprets those bytes.

This is something I've written about before; longtime readers might find this post repetitive.

Consider this code:


const std = @import("std");
pub fn main() !void {
  std.debug.print("{d}\n", .{@sizeOf(User)});
}

const User = struct {
  id: u32,
  name: []const u8,
};

It should print 24. The point of this post isn't why it prints 24. What's important here is that when we create a User - whether it's on the stack or the heap - it is represented by 24 bytes of memory.

If you examine those 24 bytes, there's nothing "User" about them. The memory isn't self-describing - that would be inefficient. Rather, it's the compiler itself that maintains meta data about memory. Very naively, we could imagine that the compiler maintains a lookup where the key is the variable name and the value is the memory address (our 24 bytes) + the type (User).

The fun, and sometimes useful thing about this is that we can alter the compiler's meta data. Here's an working but impractical example:


const std = @import("std");
pub fn main() !void {
  var user = User{.id = 9001, .name = "Goku"};
  const tea: *Tea = @ptrCast(&user);
  std.debug.print("{any}\n", .{tea});
}

const User = struct {
  id: u32,
  name: []const u8,
};

const Tea = struct {
  price: u32,
  type: TeaType,

  const TeaType = enum {
    black,
    white,
    green,
    herbal,
  };
};

First we create a User - nothing unusual about that. Next we use @ptrCast to tell the compiler to treat the memory referenced by user as a *Tea. @ptrCast works on addresses, which is why we give it address of (&) user and get back a pointer (*) to Tea. Here the return type of @ptrCast is inferred by the type it's being assigned to.

You might have some questions like what does it print? Or, is it safe? And, is this ever useful?

We'll dig more into the safety of this in a bit. But briefly, the main concern is about the size of our structures. If @sizeOf(User) is 24 bytes, we'll be able to re-interpret that memory as anything which is 24 bytes or less. The @sizeOf(Tea) is 8 bytes, so this is safe.

I get different results on each run:


.{ .price = 39897726, .type = .white }
.{ .price = 75123326, .type = .white }
.{ .price = 6441598, .type = .white }
.{ .price = 77826686, .type = .white }
.{ .price = 4950654, .type = .white }
.{ .price = 69438078, .type = .white }
.{ .price = 78498430, .type = .white }
.{ .price = 79022718, .type = .white }

It's possible (but not likely) you get consistent result. I find these results surprising. If had to imagine what the 24 bytes of user looks like, I'd come up with:


 41, 35, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, x, x, x, x, x, x, x, x

Why that? Well, I'd expect the first 8 bytes to be the id, 9001, which has a byte representation of 41, 35, 0, 0, 0, 0, 0, 0. The next 8 bytes I think would be the string length, or 4, 0, 0, 0, 0, 0, 0, 0 The last 8 bytes would be the pointer to actual string value - an address that I have no way of guessing, so I mark it with x, x, x, x, x, x, x, x.

If you think the id should only take 4 bytes, given that it's a u32, good! But Zig will usually align struct fields, so it really will take 8 bytes. That isn't something we'll dive into this post though.

Since Tea is only 8 bytes and since the first 8 bytes of user are always the same (only the pointer to the name value changes from instance to instance and from run to run), shouldn't we always get the same Tea value?

Yes, but only if I'm correct about the contents of those 24 bytes for user. Unless we tell it otherwise, Zig makes no guarantees about how it lays out the fields of a struct. The fact that our tea keeps changing, makes me believe that, for reasons I don't know, Zig decided to put the pointer to our name at the start.

The reason you might get different results is that Zig might have organized the user's memory different based on your platform or version of Zig (or any other factor, but those are the two more realistic reasons).

So while this code might never crash, doesn't the lack of guarantee make it useless? No. At least not in three cases.

While Zig usually doesn't make guarantees about how data will be organized, C programs do . In Zig, a structure declared as extern follows that specification. We can similarly declare a structure as packed which also has a well-defined memory layout (but just not necessarily the same as C's / extern).

extern and packed structs can only contain extern and packed fields. In order for a struct to have a well-known memory layout, all of its field must have a well-known memory layout. They can't, for example, have slices - which don't have a guaranteed layout. Still, here's a reliable and realistic example:


const std = @import("std");
pub fn main() !void {
  var manager = Manager{.id = 4, .name = "Leto", .name_len = 4, .level = 99};
  const user: *User = @ptrCast(&manager);
  std.debug.print("{d}: {s}\n", .{user.id, user.name[0..user.name_len]});
}

const User = extern struct {
  id: u32,
  name: [*c]const u8,
  name_len: usize,
};

const Manager = extern struct {
  id: u32,
  name: [*c]const u8,
  name_len: usize,
  level: u16,
};

Part of the guarantee is that the fields are laid out in the order that they're declared. Above, when I guessed at the layout of user, I made that assumption - but it's only valid for extern structs. We can be sure that the above code will print 4: Leto because Manager has the same fields as User and in the same order. We can, and should, make this more explicit:


const Manager = extern struct {
  user: User,
  level: u16,
};

Because the type information is only meta data of the compiler, both declarations of Manager are the same - they're the same size and have the same layout. There's no overhead to embedding the User into Manager this way.

This type of memory-reinterpretation can be found in some C code and thus see in any Zig code that interacts with such a C codebase.

While we can't assume anything about the memory layout of non-extern (or packed) struct, we can leverage various built-in functions to programmatically figure things out, such as @sizeOf. Probably the most useful is @offsetOf which gives us the offset of a field in bytes.


const std = @import("std");
pub fn main() !void {
  std.debug.print("name offset: {d}\n", .{@offsetOf(User, "name")});
  std.debug.print("id offset: {d}\n", .{@offsetOf(User, "id")});
}

const User = struct {
  id: u32,
  name: []const u8,
};

For me, this prints:


name offset: 0
id offset: 16

This helps confirm that Zig did, in fact, put the name before the id. We saw the result of that when we treated the user's memory as an instance of Tea. If we wanted to create a Tea based on the address of user.id rather than user, we could do:


const std = @import("std");
pub fn main() !void {
  var user = User{.id = 9001, .name = "Goku"};

  // changed from &user to &user.id
  const tea: *Tea = @ptrCast(&user.id);
  std.debug.print("{any}\n", .{tea});
}

This will now always output the same result. But how would we take tea and get a user out of it? Generally speaking, this wouldn't be safe since @sizeOf(Tea) < @sizeOf(User) - the memory created to hold an instance of Tea, 8 bytes, can't represent the 24 bytes need for User. But for this instance of Tea, we know that there are 24 bytes available "around" tea. Where exactly those 24 bytes start depends on the relative position of user.id to user itself. If we don't adjust for that offset, we risk crashing unless the offset happens to be 0. Since we know the offset is 16, not 0, this should crash:


const std = @import("std");
pub fn main() !void {
  var user = User{.id = 9001, .name = "Goku"};
  var tea: *Tea = @ptrCast(&user.id);

  const user2: *User = @ptrCast(&tea);
  std.debug.print("{any}\n", .{user2});
}

This is our user's memory (as 24 contiguous bytes of memory, broken up by the 3 8-byte fields):


name.ptr => x, x, x, x, x, x, x, x
name.len => 4, 0, 0, 0, 0, 0, 0, 0,
name.id  => 41, 35, 0, 0, 0, 0, 0, 0

And when we make tea from &name.id:


       name.ptr => x, x, x, x, x, x, x, x
       name.len => 4, 0, 0, 0, 0, 0, 0, 0,
tea => name.id  => 41, 35, 0, 0, 0, 0, 0, 0
                   more memory, but not ours to play with

If we try to cast tea back into a *User, we'll be 16 bytes off, and end up reading 16 bytes of memory adjacent to tea which isn't ours.

To make this work, we need to take the address of tea and subtract the @offset(User, "id") from it:


const std = @import("std");
pub fn main() !void {
  var user = User{.id = 9001, .name = "Goku"};
  const tea: *Tea = @ptrCast(&user.id);
  const user2: *User = @ptrFromInt(@intFromPtr(tea) - @offsetOf(User, "id"));
  std.debug.print("{any}\n", .{user2});
}

Because we use @offsetOf, it no longer matters how the structure is laid out. We're always able to find the starting address of user based on the address of user.id (which is where tea points to) because we know @offsetOf(User, "id").

The above example is convoluted. There's no relationship between the data of a User and of Tea. What does it mean to create Tea out of a user's id? Nothing.

What if we forget about user's data, the id and name, and treat those 24 bytes as usable space?


const std = @import("std");
pub fn main() !void {
  var user = User{.id = 9001, .name = "Goku"};

  const tea: *Tea = @ptrCast(&user);
  tea.* = .{.price = 2492, .type = .black};

  std.debug.print("{any}\n", .{tea});
}

user and tea still share the same memory. We cannot safely use user after writing to tea.* - that write might have stored data that cannot safely be interpreted as a User. Specifically in this case, the write to tea has probably made name.ptr point to invalid memory. But if we're done with user and know it won't be used again, we just saved a few bytes of memory by re-using its space.

This can go on forever. We can safely re-use the space to create another User, as long as we're 100% sure that we're done with tea::


pub fn main() !void {
  var user = User{.id = 9001, .name = "Goku"};

  const tea: *Tea = @ptrCast(&user);
  tea.* = .{.price = 2492, .type = .black};
  std.debug.print("{any}\n", .{tea});

  const user2: *User = @ptrCast(@alignCast(tea));
  user2.* = .{.id = 32, .name = "Son Goku"};
  std.debug.print("{any}\n", .{user2});
}

We can re-use those 24 bytes to represent anything that takes 24 bytes of memory or less.

The best practical example of this is std.heap.MemoryPool(T). The MemoryPool is an allocator that can create a single type, T. That might not sound particularly useful, but using what we've learned so far, it can efficiently at re-use memory of discarded values.

We'll build a simplified version to see how it works, starting with a basic API - one without any recycling ability. Further, rather than make it generic, we'll make a UserPool specific for User:


pub const UserPool = struct {
  allocator: Allocator,

  pub fn init(allocator: Allocator) UserPool {
    return .{
      .allocator = allocator,
    };
  }

  pub fn create(self: *UserPool) !*User {
    return self.allocator.create(User);
  }

  pub fn destroy(self; *UserPool, user: *User) void {
    self.allocator.destroy(user);
  }
};

As-is, this is just a wrapper that limits what the allocator is able to create. Not particularly useful. But what if instead of destroying a user we made it available to subsequent create? One way to do that would be to hold an std.SinglyLinkedList. But for that to work, we'd need to make additional allocations - the linked list node has to exist somewhere. But why? The @sizeOf(User) is large enough to be used as-is, and whenever a user is destroyed, we're being told that memory is free to be used. If an application did use a user after destroying it, it would be undefined behavior, just like it is with any other allocator. Let's add a bit of decoration to our UserPool:


pub const UserPool = struct {
  allocator: Allocator,
  free_list: ?*FreeEntry = null,

  const FreeEntry = struct {
      next: ?*FreeEntry,
  };

  // rest is unchanged . . . for now.
};

We've added a linked list to our UserPool. Every FreeEntry points to another *FreeEntry or null, including the initial one referenced by free_list. Now we change destroy:


pub const UserPool = struct {
  // ...

  pub fn destroy(self: *UserPool, user: *User) void {
    const entry: *FreeEntry = @ptrCast(user);
    entry.* = .{ .next = self.free_list };
    self.free_list = entry;
  }
};

We use the ideas we've explored above to create a simple linked list. All that's left is to change create to leverage it:


pub const UserPool = struct {
  // ...

  pub fn create(self: *UserPool) !*User {
    if (self.free_list) |entry| {
      self.free_list = entry.next;
      return @ptrCast(entry);
    }
    return self.allocator.create(User);
  }
};

If we have a FreeEntry, then we can turn that into a *User. We make sure to advanced our free_list to the next entry, which might be null. If there isn't an available FreeEntry, we allocate a new one.

As a final step, we should add a deinit to free the memory held by our free_list:


pub const UserPool = struct {
  // ...

  pub fn deinit(self: *UserPool) void {
    var entry = self.free_list;
    while (entry) |e| {
      entry = e.next;
      const user: *User = @ptrCast(e);
      self.allocator.destroy(user);
    }
    self.free_list = null;
  }
};

That final @ptrCast from a *FreeEntry to a *User might seem unnecessary. If we're freeing the memory, why does the type matter? But allocators only know how much memory to free because the compiler tells them - based on the type. Freeing e, a *FreeEntry would only work if @sizeOf(FreeEntry) == @sizeOf(User) (which it isn't).

In addition to being generic, Zig's actual MemoryPool is a bit more sophisticated, handling different alignments and even handling the case where @sizeOf(T) < @sizeOf(FreeEntry), but our UserPool is pretty close.

By altering the compiler's view of our program, we can do all types of things and get into all types of trouble. While these manipulations can be done safely, they rely on understanding the lack of guarantees Zig makes. If you're programming in Zig, this is the type of thing you should try to get comfortable with. Most of this is fundamental regardless of the programming language, it's just that some languages, like Zig, give you more control.

I had initially planned on writing a version of MemoryPool which expanded on the standard library's. I wanted to create a pool for multiple types. For example, one that can be used for both User and Tea instances The trick, of course, would be to always allocate memory for the largest supported type (User in this case). But this post is already long, so I leave it as an exercise for you.

Leave a comment

I'm too dumb for Zig's new IO interface

2025-08-22T00:00:00Z

You might have heard that Zig 0.15 introduces a new IO interface, with the focus for this release being the new std.Io.Reader and std.Io.Writer types. The old "interfaces" had problems. Like this performance issue that I opened. And it relied on a mix of types, which always confused me, and a lot of anytype - which is generally great, but a poor foundation to build an interface on.

I've been slowly upgrading my libraries, and I ran into changes to the tls.Client client used by my smtp library. For the life of me, I just don't understand how it works.

Zig has never been known for its documentation, but if we look at the documentation for tls.Client.init, we'll find:


pub fn init(input: *std.Io.Reader, output: *std.Io.Writer, options: Options) InitError!Client
Initiates a TLS handshake and establishes a TLSv1.2 or TLSv1.3 session.

So it takes one of these new Readers and a new Writer, along with some options (sneak peak, which aren't all optional). It doesn't look like you can just give it a net.Stream, but net.Stream does expose a reader() and writer() method, so that's probably a good place to start:


const stream = try std.net.tcpConnectToHost(allocator, "www.openmymind.net", 443);
defer stream.close();

var writer = stream.writer(&.{});
var reader = stream.reader(&.{});

var tls_client = try std.crypto.tls.Client.init(
  reader.interface(),
  &writer.interface,
  .{}, // options TODO
);

Note that stream.writer() returns a Stream.Writer and stream.reader() returns a Stream.Reader - those aren't the types our tls.Client expects. To convert the Stream.Reader to an *std.Io.Reader, we need to call its interface() method. To get a *std.io.Writer from an Stream.Writer, we need the address of its &interface field. This doesn't seem particularly consistent. Don't forget that the writer and reader need a stable address. Because I'm trying to get the simplest example working, this isn't an issue - everything will live on the stack of main. In a real word example, I think it means that I'll always have to wrap the tls.Client into my own heap-allocated type; giving the writer and reader have a cozy stable home.

Speaking of allocations, you might have noticed that stream.writer and stream.reader take a parameter. It's the buffer they should use. Buffering is a first class citizen of the new Io interface - who needs composition? The documentation does tell me these need to be at least std.crypto.tls.max_ciphertext_record_len large, so we need to fix things a bit:


var write_buf: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;
var writer = stream.writer(&write_buf);

var read_buf: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;
var reader = stream.reader(&read_buf);

Here's where the code stands:


const std = @import("std");

pub fn main() !void {
  var gpa: std.heap.DebugAllocator(.{}) = .init;
  const allocator = gpa.allocator();

  const stream = try std.net.tcpConnectToHost(allocator, "www.openmymind.net", 443);
  defer stream.close();

  var write_buf: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;
  var writer = stream.writer(&write_buf);

  var read_buf: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;
  var reader = stream.reader(&read_buf);

  var tls_client = try std.crypto.tls.Client.init(
      reader.interface(),
      &writer.interface,
      .{
      },
  );
  defer tls_client.end() catch {};
}

But if you try to run it, you'll get a compilation error. Turns out we have to provide 4 options: the ca_bundle, a host, a write_buffer and a read_buffer. Normally I'd expect the options parameter to be for optional parameters, I don't understand why some parameters (input and output) are passed one way while writer_buffer and read_buffer are passed another.

Let's give it what it wants AND send some data:



// existing setup...

var bundle = std.crypto.Certificate.Bundle{};
try bundle.rescan(allocator);
defer bundle.deinit(allocator);

var tls_client = try std.crypto.tls.Client.init(
  reader.interface(),
  &writer.interface,
  .{
    .ca = .{.bundle = bundle},
    .host = .{ .explicit = "www.openmymind.net" } ,
    .read_buffer = &.{},
    .write_buffer = &.{},
  },
);
defer tls_client.end() catch {};

try tls_client.writer.writeAll("GET / HTTP/1.1\r\n\r\n");

Now, if I try to run it, the program just hangs. I don't know what write_buffer is, but I know Zig now loves buffers, so let's try to give it something:



// existing setup...

// I don't know what size this should/has to be??
var write_buf2: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;

var tls_client = try std.crypto.tls.Client.init(
  reader.interface(),
  &writer.interface,
  .{
    .ca = .{.bundle = bundle},
    .host = .{ .explicit = "www.openmymind.net" } ,
    .read_buffer = &.{},
    .write_buffer = &write_buf2,
  },
);
defer tls_client.end() catch {};

try tls_client.writer.writeAll("GET / HTTP/1.1\r\n\r\n");

Great, now the code doesn't hang, all we need to do is read the response. tls.Client exposes a reader: *std.Io.Reader field which is "Decrypted stream from the server to the client." That sounds like what we want, but believe it or not std.Io.Reader doesn't have a read method. It has a peak a takeByteSigned, a readSliceShort (which seems close, but it blocks until the provided buffer is full), a peekArray and a lot more, but nothing like the read I'd expect. The closest I can find, which I think does what I want, is to stream it to a writer:


var buf: [1024]u8 = undefined;
var w: std.Io.Writer = .fixed(&buf);
const n = try tls_client.reader.stream(&w, .limited(buf.len));
std.debug.print("read: {d} - {s}\n", .{n, buf[0..n]});

If we try to run the code now, it crashes. We've apparently failed an assertion regarding the length of a buffer. So it seems like we also have to provide a read_buffer.

Here's my current version (it doesn't work, but it doesn't crash!):


const std = @import("std");

pub fn main() !void {
  var gpa: std.heap.DebugAllocator(.{}) = .init;
  const allocator = gpa.allocator();

  const stream = try std.net.tcpConnectToHost(allocator, "www.openmymind.net", 443);
  defer stream.close();

  var write_buf: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;
  var writer = stream.writer(&write_buf);

  var read_buf: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;
  var reader = stream.reader(&read_buf);

  var bundle = std.crypto.Certificate.Bundle{};
  try bundle.rescan(allocator);
  defer bundle.deinit(allocator);

  var write_buf2: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;
  var read_buf2: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;

  var tls_client = try std.crypto.tls.Client.init(
      reader.interface(),
      &writer.interface,
      .{
        .ca = .{.bundle = bundle},
        .host = .{ .explicit = "www.openmymind.net" } ,
        .read_buffer = &read_buf2,
        .write_buffer = &write_buf2,
      },
  );
  defer tls_client.end() catch {};

  try tls_client.writer.writeAll("GET / HTTP/1.1\r\n\r\n");

  var buf: [std.crypto.tls.max_ciphertext_record_len]u8 = undefined;
  var w: std.Io.Writer = .fixed(&buf);
  const n = try tls_client.reader.stream(&w, .limited(buf.len));
  std.debug.print("read: {d} - {s}\n", .{n, buf[0..n]});
}

When I looked through Zig's source code, there's only one place using tls.Client. It helped to get me where where I am. I couldn't find any tests.

I'll admit that during this migration, I've missed some basic things. For example, someone had to help me find std.fmt.printInt - the renamed version of std.fmt.formatIntBuf. Maybe there's a helper like: tls.Client.init(allocator, stream) somewhere. And maybe it makes sense that we do reader.interface() but &writer.interface - I'm reminded of Go's *http.Request and http.ResponseWrite. And maybe Zig has some consistent rule for what parameters belong in options. And I know nothing about TLS, so maybe it makes complete sense to need 4 buffers. I feel a bit more confident about the weirdness of not having a read(buf: []u8) !usize function on Reader, but at this point I wouldn't bet on me.

Leave a comment

Zig's new Writer

2025-07-17T00:00:00Z

As you might have heard, Zig's Io namespace is being reworked. Eventually, this will mean the re-introduction of async. As a first step though, the Writer and Reader interfaces and some of the related code have been revamped.

This post is written based on a mid-July 2025 development release of Zig. It doesn't apply to Zig 0.14.x (or any previous version) and is likely to be outdated as more of the Io namespace is reworked.

Not long ago, I wrote a blog post which tried to explain Zig's Writers. At best, I'd describe the current state as "confusing" with two writer interfaces while often dealing with anytype. And while anytype is convenient, it lacks developer ergonomics. Furthermore, the current design has significant performance issues for some common cases.

The new Writer interface is std.Io.Writer. At a minimum, implementations have to provide a drain function. Its signature looks like:


fn drain(w: *Writer, data: []const []const u8, splat: usize) Error!usize

You might be surprised that this is the method a custom writer needs to implemented. Not only does it take an array of strings, but what's that splat parameter? Like me, you might have expected a simpler write method:


fn write(w: *Writer, data: []const u8) Error!usize

It turns out that std.Io.Writer has buffering built-in. For example, if we want a Writer for an std.fs.File, we need to provide the buffer:


var buffer: [1024]u8 = undefined;
var writer = my_file.writer(&buffer);

Of course, if we don't want buffering, we can always pass an empty buffer:


var writer = my_file.writer(&.{});

This explains why custom writers need to implement a drain method, and not something simpler like write.

The simplest way to implement drain, and what a lot of the Zig standard library has been upgraded to while this larger overhaul takes place, is:


fn drain(io_w: *std.Io.Writer, data: []const []const u8, splat: usize) !usize {
    _ = splat;
    const self: *@This() = @fieldParentPtr("interface", io_w);
    return self.writeAll(data[0]) catch return error.WriteFailed;
}

We ignore the splat parameter, and just write the first value in data (data.len > 0 is guaranteed to be true). This turns drain into what a simpler write method would look like. Because we return the length of bytes written, std.Io.Writer will know that we potentially didn't write all the data and call drain again, if necessary, with the rest of the data.

If you're confused by the call to @fieldParentPtr, check out my post on the upcoming linked list changes.

The actual implementation of drain for the File is a non-trivial ~150 lines of code. It has platform-specific code and leverages vectored I/O where possible. There's obviously flexibility to provide a simple implementation or a more optimized one.

Much like the current state, when you do file.writer(&buffer), you don't get an std.Io.Writer. Instead, you get a File.Writer. To get an actual std.Io.Writer, you need to access the interface field. This is merely a convention, but expect it to be used throughout the standard, and third-party, library. Get ready to see a lot of &xyz.interface calls!

This simplification of File shows the relationship between the three types:


pub const File = struct {

  pub fn writer(self: *File, buffer: []u8) Writer{
    return .{
      .file = self,
      .interface = std.Io.Writer{
        .buffer = buffer,
        .vtable = .{.drain = Writer.drain},
      }
    };
  }

  pub const Writer = struct {
    file: *File,
    interface: std.Io.Writer,
    // this has a bunch of other fields

    fn drain(io_w: *std.Io.Writer, data: []const []const u8, splat: usize) !usize {
      const self: *Writer = @fieldParentPtr("interface", io_w);
      // ....
    }
  }
}

The instance of File.Writer needs to exist somewhere (e.g. on the stack) since that's where the std.Io.Writer interface exists. It's possible that File could directly have an writer_interface: std.Io.Writer field, but that would limit you to one writer per file and would bloat the File structure.

We can see from the above that, while we call Writer an "interface", it's just a normal struct. It has a few fields beyond buffer and vtable.drain, but these are the only two with non-default values; we have to provide them. The Writer interface implements a lot of typical "writer" behavior, such as a writeAll and print (for formatted writing). It also has a number of methods which only a Writer implementation would likely care about. For example, File.Writer.drain has to call consume so that the writer's internal state can be updated. Having all of these functions listed side-by-side in the documentation confused me at first. Hopefully it's something the documentation generation will one day be able to help disentangle.

The new Writer has taken over a number of methods. For example, std.fmt.formatIntBuf no longer exists. The replacement is the printInt method of Writer. But this requires a Writer instance rather than the simple []u8 previous required.

It's easy to miss, but the Writer.fixed([]u8) Writer function is what you're looking for. You'll use this for any function that was migrating to Writer and used to work on a buffer: []u8.

While migrating, you might run into the following error: no field or member function named 'adaptToNewApi' in '...'. You can see why this happens by looking at the updated implementation of std.fmt.format:


pub fn format(writer: anytype, comptime fmt: []const u8, args: anytype) !void {
    var adapter = writer.adaptToNewApi();
    return adapter.new_interface.print(fmt, args) catch |err| switch (err) {
        error.WriteFailed => return adapter.err.?,
    };
}

Because this functionality was moved to std.Io.Writer, any writer passed into format has to be able to upgrade itself to the new interface. This is done, again only be convention, by having the "old" writer expose an adaptToNewApi method which returns a type that exposes a new_interface: std.Io.Writer field. This is pretty easy to implement using the basic drain implementation, and you can find a handful of examples in the standard library, but it's of little help if you don't control the legacy writer.

I'm hesitant to provide opinion on this change. I don't understand language design. However, while I think this is an improvement over the current API, I keep thinking that adding buffering directly to the Writer isn't ideal.

I believe that most languages deal with buffering via composition. You take a reader/writer and wrap it in a BufferedReader or BufferedWriter. This approach seems both simple to understand and implement while being powerful. It can be applied to things beyond buffering and IO. Zig seems to struggle with this model. Rather than provide a cohesive and generic approach for such problems, one specific feature (buffering) for one specific API (IO) was baked into the standard library. Maybe I'm too dense to understand or maybe future changes will address this more holistically.

Leave a comment

Zig's new LinkedList API (it's time to learn @fieldParentPtr)

2025-04-10T00:00:00Z

In a recent, post-Zig 0.14 commit, Zig's SinglyLinkedList and DoublyLinkedList saw significant changes.

The previous version was a generic and, with all the methods removed, looked like:


pub fn SinglyLinkedList(comptime T: type) type {
  return struct {
    first: ?*Node = null,

    pub const Node = struct {
      next: ?*Node = null,
      data: T,
    };
  };
}

The new version isn't generic. Rather, you embed the linked list node with your data. This is known as an intrusive linked list and tends to perform better and require fewer allocations. Except in trivial examples, the data that we store in a linked list is typically stored on the heap. Because an intrusive linked list has the linked list node embedded in the data, it doesn't need its own allocation. Before we jump into an example, this is what the new structure looks like, again, with all methods removed:


pub const SinglyLinkedList = struct {
  first: ?*Node = null,

  pub const Node = struct {
    next: ?*Node = null,
  };
};

Much simpler, and, notice that this has no link or reference to any of our data. Here's a working example that shows how you'd use it:


const std = @import("std");
const SinglyLinkedList = std.SinglyLinkedList;

pub fn main() !void {
    // GeneralPurposeAllocator is being renamed
    // to DebugAllocator. Let's get used to that name
    var gpa: std.heap.DebugAllocator(.{}) = .init;
    const allocator = gpa.allocator();

    var list: SinglyLinkedList = .{};

    const user1 = try allocator.create(User);
    defer allocator.destroy(user1);
    user1.* = .{
        .id = 1,
        .power = 9000,
        .node = .{},
    };
    list.prepend(&user1.node);

    const user2 = try allocator.create(User);
    defer allocator.destroy(user2);
    user2.* = .{
        .id = 2,
        .power = 9001,
        .node = .{},
    };
    list.prepend(&user2.node);

    var node = list.first;
    while (node) |n| {
        std.debug.print("{any}\n", .{n});
        node = n.next;
    }
}

const User = struct {
    id: i64,
    power: u32,
    node: SinglyLinkedList.Node,
};

To run this code, you'll need a nightly release from within the last week. What do you think the output will be? You should see something like:


SinglyLinkedList.Node{ .next = SinglyLinkedList.Node{ .next = null } }
SinglyLinkedList.Node{ .next = null }

We're only getting the nodes, and, as we can see here and from the above skeleton structure of the new SinglyLinkedList, there's nothing about our users. Users have nodes, but there's seemingly nothing that links a node back to its containing user. Or is there?

In the past, we've described how the compiler uses the type information to figure out how to access fields. For example, when we execute user1.power, the compiler knows that:

id is +0 bytes from the start of the structure,
power is +8 bytes from the start of the structure (because id is an i64), and
power is an i32

With this information, the compiler knows how to access power from user1 (i.e. jump forward 8 bytes, read 4 bytes and treat it as an i32). But if you think about it, that logic is simple to reverse. If we know the address of power, then the address of user has to be address_of_power - 8. We can prove this:


const std = @import("std");

pub fn main() !void {
    var user = User{
        .id = 1,
        .power = 9000,
    };
    std.debug.print("address of user: {*}\n", .{&user});

    const address_of_power = &user.power;
    std.debug.print("address of power: {*}\n", .{address_of_power});

    const power_offset = 8;
    const also_user: *User = @ptrFromInt(@intFromPtr(address_of_power) - power_offset);
    std.debug.print("address of also_user: {*}\n", .{also_user});

    std.debug.print("also_user: {}\n", .{also_user});
}

const User = struct {
    id: i64,
    power: u32,
};

The magic happens here:


const power_offset = 8;
const also_user: *User = @ptrFromInt(@intFromPtr(address_of_power) - power_offset);

We're turning the address of our user's power field, &user.power into an integer, subtracting 8 (8 bytes, 64 bits), and telling the compiler that it should treat that memory as a *User. This code will probably work for you, but it isn't safe. Specifically, unless we're using a packed or extern struct, Zig makes no guarantees about the layout of a structure. It could put power BEFORE id, in which case our power_offset should be 0. It could add padding after every field. It can do anything it wants. To make this code safer, we use the @offsetOf builtin to get the actual byte-offset of a field with respect to its struct:


const power_offset = @offsetOf(User, "power");

Back to our linked list, given that we have the address of a node and we know that it is part of the User structure, we are able to get the User from a node. Rather than use the above code though, we'll use the slightly friendlier @fieldParentPtr builtin. Our while loop changes to:


while (node) |n| {
  const user: *User = @fieldParentPtr("node", n);
  std.debug.print("{any}\n", .{user});
  node = n.next;
}

We give @fieldParentPtr the name of the field, a pointer to that field as well as a return type (which is inferred above by the assignment to a *User variable), and it gives us back the instance that contains that field.

Performance aside, I have mixed feelings about the new API. My initial reaction is that I dislike exposing, what I consider, a complicated builtin like @fieldParentPtr for something as trivial as using a linked list. However, while @fieldParentPtr seems esoteric, it's quite useful and developers should be familiar with it because it can help solve problems which are otherwise problematic.

Leave a comment

Allocator.resize

2025-03-27T00:00:00Z

There are four important methods on Zig's std.mem.Allocator interface that Zig developers must be comfortable with:

alloc(T, n) - which creates an array of n items of type T,
free(ptr) - which frees memory allocate with alloc (although, this is implementation specific),
create(T) - which creates a single item of type T, and
destroy(ptr) - which destroys an item created with create

While you might never need to use them, the Allocator interface has other methods which, if nothing else, can be useful to be aware of and informative to learn about.

In particularly, the resize method is used to try and resize an existing allocation to a larger (or smaller) size. The main promise of resize is that it's guaranteed not to move the pointer. However, to satisfy that guarantee, resize is allowed to fail, in which case nothing changes.

We can imagine a simple allocation:


// var buf = try allocator.alloc(u8, 5);
// buf[0] = 'h'

           0x102e00000
           -------------------------------
buf.ptr -> |  h  |     |     |     |     |
          --------------------------------

Now, if we were to call allocator.resize(buf, 7), there are be two possible outcomes. The first is that the call returns false, indicating that the resize operation fails, and thus nothing changed::


           0x102e00000
           -------------------------------
buf.ptr -> |  h  |     |     |     |     |
          --------------------------------

However, when resize succeeds and returns true, the allocated space has grown without having relocated (i.e. moved) the pointer:


           0x102e00000
           -------------------------------------------
buf.ptr -> |  h  |     |     |     |     |     |     |
          --------------------------------------------

Now under what circumstances resize succeeds and fails is a black box. It depends on a lot of factors and is going to be allocator-specific. For example, for me, this code prints false indicating that the resize failed:


const std = @import("std");

pub fn main() !void {
    var gpa: std.heap.GeneralPurposeAllocator(.{}) = .init;
    const allocator = gpa.allocator();
    _ = gpa.detectLeaks();

    const buf = try allocator.alloc(usize, 10);
    std.debug.print("{any}\n", .{allocator.resize(buf, 20)});
    allocator.free(buf);
}

Because we're using a GeneralPurposeAllocator (that name is deprecated in Zig 0.14 in favor of DebugAllocator) we could dive into its internals and try to leverage knowledge of its implementation to force a resize to succeed, but a simpler option is to resize our buffer to 0:


const std = @import("std");

pub fn main() !void {
    var gpa: std.heap.GeneralPurposeAllocator(.{}) = .init;
    const allocator = gpa.allocator();
    _ = gpa.detectLeaks();

    const buf = try allocator.alloc(usize, 10);
    // change 20 -> 0
    std.debug.print("{any}\n", .{allocator.resize(buf, 0)});
    allocator.free(buf);
}

Success, the code now prints true, indicating that the resize succeeded. However, I also get segfault. Can you guess what we're doing wrong?

In our above visualization, we saw how a successful resize does not move our pointer. We can confirm this by looking at the address of buf.ptr before and after our resize. This code still segfaults, but it prints out the information first:


pub fn main() !void {
    var gpa: std.heap.GeneralPurposeAllocator(.{}) = .init;
    const allocator = gpa.allocator();
    _ = gpa.detectLeaks();

    const buf = try allocator.alloc(usize, 10);
    std.debug.print("address before resize: {*}\n", .{buf.ptr});
    std.debug.print("resize succeeded: {any}\n", .{allocator.resize(buf, 0)});
    std.debug.print("address after resize: {*}\n", .{buf.ptr});
    allocator.free(buf);
}

So far, we've only considered the ptr of our slice, but, like the criminal justice system, a slice is represented by two separate yet equally important groups: a ptr and a len. If we change our code to also look at the len of buf, the issue might become more obvious:


// change the 1st and 3rd line to also print buf.len:
std.debug.print("address & len before resize: {*} {d}\n", .{buf.ptr, buf.len});
std.debug.print("resize succeeded: {any}\n", .{allocator.resize(buf, 0)});
std.debug.print("address & len after resize: {*} {d}\n", .{buf.ptr, buf.len});

This is what I get:


address & len before resize: usize@100280000 10
resize succeeded: true
address & len after resize: usize@100280000 10
Segmentation fault at address 0x100280000

While it isn't the cleanest output, notice that even after we successfully resize our ptr, the length remains unchanged (i.e. 10). Herein lies our bug problem. resize updates the underlying memory, it doesn't update the length of the slice. That's something we need to take care of. Here's a non-crashing version:


const std = @import("std");

pub fn main() !void {
    var gpa: std.heap.GeneralPurposeAllocator(.{}) = .init;
    const allocator = gpa.allocator();
    _ = gpa.detectLeaks();

    var buf = try allocator.alloc(usize, 10);
    if (allocator.resize(buf, 0)) {
        std.debug.print("resize succeeded!\n", .{});
        buf.len = 0;
    } else {
        // we need to handle the case where resize doesn't succeed
    }

    allocator.free(buf);
}

What's left out of the above code is handling the case where resize fails. This becomes application specific. In most cases, where we're likely resizing to a larger size, we'll generally need to fallback to calling alloc to create our larger memory, and then, most likely, @memcpy to copy data from the existing (now too small) buffer to the newly created larger one.

Leave a comment

ArenaAllocator.free and Nested Arenas

2025-03-15T00:00:00Z

What happens when you free with an ArenaAllocator? You might be tempted to look at the documentation for std.mem.Allocator.free which says "Free an array allocated with alloc". But this is the one thing we're sure it won't do.

In its current implementation, calling free usually does nothing: the freed memory isn't made available for subsequent allocations by the arena, and it certainly isn't released back to the operating system. However, under specific conditions free will make the memory re-usable by the arena. The only way to really "free" the memory is to call deinit.

The only case when we're guaranteed that the memory will be reusable by the arena is when it was the last allocation made:


const str1 = try arena.dupe(u8, "Over 9000!!!");
arena.free(str1);

Above, whatever memory was allocated to duplicate our string will be available for subsequent allocations made with arena. In the following case, the two calls to arena.free do nothing:


const str1 = try arena.dupe(u8, "ab");
const str2 = try arena.dupe(u8, "12");
arena.free(str1);
arena.free(str2);

In order to "fix" this code, we'd need to reverse the order of the two frees:


const str1 = try arena.dupe(u8, "ab");
const str2 = try arena.dupe(u8, "12");
arena.free(str2);  //swapped this line with the next
arena.free(str1);

Now, when we call arena.free(str2), the memory allocated for str2 will be available to subsequent allocations. But what happens when we call arena.free(str1)? The answer, again, is: it depends. It has to do with the internal state of the arena. Simplistically, an ArenaAllocator keeps a linked list of memory buffers. Imagine something like:


buffer_list.head -> ------------
                    |   next   | -> null
                    |   ----   |
                    |          |
                    |          |
                    |          |
                    |          |
                    |          |
                    ------------

Our linked list has a single node along with 5 bytes of available space. After we allocate str1, it looks like:


buffer_list.head -> ------------
                    |   next   | -> null
                    |   ----   |
            str1 -> |    a     |
                    |    b     |
                    |          |
                    |          |
                    |          |
                    ------------

Then, when we allocate str2, it looks like:


buffer_list.head -> ------------
                    |   next   | -> null
                    |   ----   |
            str1 -> |    a     |
                    |    b     |
            str2 -> |    1     |
                    |    2     |
                    |          |
                    ------------

When we free str2, it goes back to how it was before:


buffer_list.head -> ------------
                    |   next   | -> null
                    |   ----   |
            str1 -> |    a     |
                    |    b     |
                    |          |
                    |          |
                    |          |
                    ------------

Which means that when we arena.free(str1), it will make that memory available again. However, if instead of allocating two strings, we allocate three:


const str1 = try arena.dupe(u8, "ab");
const str2 = try arena.dupe(u8, "12");
const str3 = try arena.dupe(u8, "()");
arena.free(str3);
arena.free(str2);
arena.free(str1);

Our first buffer doesn't have enough space for the new string, so a new node is prepended to our linked list:


buffer_list.head -> ------------    ------------
                    |   next   | -> |   next   | -> null
                    |   ----   |    |   ----   |
            str3 -> |    (     |    |    a     | <- str1
                    |    )     |    |    b     |
                    |          |    |    1     | <- str2
                    |          |    |    2     |
                    |          |    |          |
                    ------------    ------------

When we call arena.free(str3), the memory for that allocation will be made available, but subsequent frees, even if they're in the correct order (i.e. freeing str2 then str1) will be noops. The ArenaAllocator doesn't have the capability to go back to act on anything but the head of our linked list, even if it's empty.

In short, when we free the last allocation, that memory will always be made available. But subsequent frees only behave this way if (a) they're also in order and (b) happen to be allocate within the same internal memory node.

Zig's allocator are said to be composable. When we create an ArenaAllocator, we pass a single parameter: an allocator. That parent allocator ⁽¹⁾ can be any other type of allocator. You can, for example, create an ArenaAllocator on top of a FixedBufferAllocator. You can also create an ArenaAllocator on top of another ArenaAllocator.

This kind of thing often happens within libraries, where an API takes an std.mem.Allocator and the library creates an ArenaAllocator. And what happens when the provided allocator was already an arena? Libraries aside, I'm mean something like:


var parent_arena = ArenaAllocator.init(gpa_allocator);
const parent_allocator = parent_arena.allocator();

var inner_arena = ArenaAllocator.init(parent_allocator);
const inner_allocator = inner_arena.allocator();

_ = try inner_allocator.dupe(u8, "Over ");
_ = try inner_allocator.dupe(u8, "9000!");

inner_arena.deinit();

It does work, but at best, when deinit is called, the memory will be made available to be re-used by inner_arena. Except in simple cases, allocations made by inner_arena are likely to span multiple buffers of parent_arena, and of course you can still make allocations directly in parent_arena which can generate its own new buffers or simply make the ordering requirement impossible to fulfill. For example, if we make an allocation in parent_arena before inner_arena.deinit(); is called:


_ = try parent_allocator.dupe(u8, "!!!");
inner_arena.deinit();

Then the deinit does nothing.

So while nesting ArenaAllocator's works, I don't think there's any advantage over using a single Arena. And, I think in many cases where you have an "inner_arena", like in a library, it's better if the caller provides a non-Arena parent allocator so that all the memory is really freed when the library is done with it. Of course, there's a transparency issue here. Unless the library documents exactly how it's using your provided allocator, or unless you explore the code - and assuming the implementation doesn't change - it's hard to know what you should use.

Leave a comment

Zig's dot star syntax (value.*)

2025-03-07T00:00:00Z

Maybe I'm the only one, but it always takes my little brain a split second to understand what's happening whenever I see, or have to write, something like value.* = .{...}.

If we take a step back, a variable is just a convenient name for an address on the stack. When this function executes:


fn isOver9000(power: i64) bool {
    return power > 9000;
}

Say, with a power of 593, we could visualize its stack as:


power ->  -------------
          |    593    |
          -------------

If we changed our function to take a pointer to an integer:


// i64 changed to *i64
fn isOver9000(power: *i64) bool {
    return power > 9000;
}

Our power argument would still be a label for a stack address, but instead of directly containing an number, the stack value would itself be an address. That's the indirection of pointers:


power ->  -------------
          | 1182145c0 |------------------------
          -------------                        |
                                               |
          .............  empty space           |
          .............  or other data         |
                                               |
          -------------                        |
          |    593    | <----------------------
          -------------

But this code doen't work: it's trying to compare a comptime_int (9000) with an *i64. We need to make another change to the function:


// i64 changed to *i64
fn isOver9000(power: *i64) bool {
    // power changed to power.*
    return power.* > 9000;
}

power.* is how we dereference a pointer. Dereferencing means to get the value pointed to by a pointer. From our above visualization, you could say that the .* follows the arrow to get the value, 593.

This same syntax works for writing as well. The following is valid:


fn isOver9000(power: *i64) bool {
    power.* = 9001;
    return true;
}

Like before, the dereferencing operator (.*), "follows" the pointer, but now that it's on the receiving end of an assignment, we write the value into the pointed add memory.

This is all true for more complex types. Let's say we have a User struct with an id and a name:


const User = struct {
    id: i32,
    name: []const u8,
};

var user = User{
    .id = 900,
    .name = "Teg"
};

The user variable is a label for the location of the [start of] the user:


user  ->  -------------
          |    900    |
          -------------
          |     3     |
          -------------
          | 3c9414e99 | -----------------------
          -------------                        |
                                               |
          .............  empty space           |
          .............  or other data         |
                                               |
          -------------                        |
          |     T     | <----------------------
          -------------
          |     e     |
          -------------
          |     g     |
          -------------

A slice in Zig, like our []const u8, is a length (3) and a pointer to the values. Now, if we were to take the address of user, via &user, we introduce a level of indirection. For example, imagine this code:


const std = @import("std");

const User = struct {
    id: i32,
    name: []const u8,
};

pub fn main() !void {
    var user = User{
        .id = 900,
        .name = "Teg"
    };
    updateUser(&user);
    std.debug.print("{d}\n", .{user.id});
}

fn updateUser(user: *User) void {
    user.id += 100000;
}

The user parameter of our updateUser function is pointing to the user on main's stack:


updateUser
user  ->   -------------
           |  83abcc30 |------------------------
           -------------                        |
                                                |
           .............  empty space           |
           .............  or other data         |
                                                |
main                                            |
user  ->   -------------                        |
           |    900    | <----------------------
           -------------
           |     3     |
           -------------
           | 3c9414e99 | -----------------------
           -------------                        |
                                                |
           .............  empty space           |
           .............  or other data         |
                                                |
           -------------                        |
           |     T     | <----------------------
           -------------
           |     e     |
           -------------
           |     g     |
           -------------

Because we're referencing main's user (rather than a copy), any changes we make will be reflected in main. But, we aren't limited to operating on fields of user, we can operate on its entire memory.

Of course, we can create a copy of the id field (assignment are always copies, just an matter of knowing what we're copying):


fn updateUser(user: *User) void {
    const id = user.id
    // ....
}

And now the stack for our function looks like:


user  ->  -------------
          |  83abcc30 |
id    ->  -------------
          |    900    |
          -------------

But we can also copy the entire user:


fn updateUser(user: *User) void {
    const copy = user.*;
    // ....
}

Whch gives us something like:


updateUser
user  ->  -------------
          |  83abcc30 |---------------------
copy  ->  -------------                     |
          |    900    |                     |
          -------------                     |
          |     3     |                     |
          -------------                     |
          | 3c9414e99 | --------------------|--
          -------------                     |  |
                                            |  |
          .............  empty space        |  |
          .............  or other data      |  |
                                            |  |
main                                        |  |
user   -> -------------                     |  |
          |    900    | <-------------------   |
          -------------                        |
          |     3     |                        |
          -------------                        |
          | 3c9414e99 | -----------------------|
          -------------                        |
                                               |
          .............  empty space           |
          .............  or other data         |
                                               |
          -------------                        |
          |     T     | <----------------------
          -------------
          |     e     |
          -------------
          |     g     |
          -------------

Notice that it didn't create a copy of the value 'Teg'. You could call this copying "shallow": it copied the 900, the 3 (name length) and the 3c9414e99 (address of the name pointer).

Just like our simpler example above, we can also assign using the dereferencing operator:


fn updateUser(user: *User) void {
    // using type inference
    // could be more explicit and do
    // user.* = User{....}

    user.* = .{
        .id = 5,
        .name = "Paul",
    };
}

This doesn't copy anything; it writes into the address that we were given, the address of the main's user:


updateUser
user  ->  -------------
          |  83abcc30 |------------------------
          -------------                        |
                                               |
          .............  empty space           |
          .............  or other data         |
                                               |
main                                        |  |
user  ->  -------------                        |
          |     5     | <----------------------
          -------------
          |     4     |
          -------------
          | 9bf4a990  | -----------------------
          -------------                        |
                                               |
          .............  empty space           |
          .............  or other data         |
                                               |
          -------------                        |
          |     P     | <----------------------
          -------------
          |     a     |
          -------------
          |     u     |
          -------------
          |     l     |
          -------------

If you're still not fully comfortable with this, and if you haven't done so already, you might be interested in the pointers and stack memory parts of my learning zig series.

Leave a comment

GetOrPut With String Keys

2025-02-27T00:00:00Z

I've previously blogged about how much I like Zig's getOrPut hashmap method. As a brief recap, we can visualize Zig's hashmap as two arrays:


  keys:               values:
       --------          --------
       | Paul  |         | 1234 |     @mod(hash("Paul"), 5) == 0
       --------          --------
       |      |          |      |
       --------          --------
       |      |          |      |
       --------          --------
       | Goku |          | 9001 |    @mod(hash("Goku"), 5) == 3
       --------          --------
       |      |          |      |
       --------          --------

When we call get("Paul"), we could think of this simplified implementation:


fn get(map: *Self, key: K) ?V {
  const index = map.getIndexOf(key) orelse return null;
  return map.values[index];
}

And, when we call getPtr("Paul"), we'd have this implementation:


fn getPtr(map: *Self, key: K) ?*V {
  const index = map.getIndexOf(key) orelse return null;
  // notice the added '&'
  // we're taking the address of the array index
  return &map.values[index];
}

By taking the address of the value directly from the hashmap's array, we avoid copying the entire value. That can have performance implications (though not for the integer value we're using here). It also allows us to directly manipulate that slot of the array:


const value = map.getPtr("Paul") orelse return;
value.* = 10;

This is a powerful feature, but a dangerous one. If the underlying array changes, as can happen when items are added to the map, value would become invalid. So, while getPtr is useful, it requires mindfulness: try to minimize the scope of such references.

getOrPut builds on the above concept. It returns a pointer to the value and the key, as well as creating the entry in the hashmap if necessary. For example, given that we already have an entry for "Paul", if we call map.getOrPut("Paul"), we'd get back a value_ptr that points to a slot in the hahmap's values array, as well as akey_ptr that points to a slot in the hashmap's keys array. If the requested key doesn't exist, we get back the same two pointers, and it's our responsibility to set the value.

If I asked you to increment counters inside of a hashmap, without getOrPut, you'd end up with two hash lookups:


// Go
count, exists := counters["hits"]
if exists == false {
  counters["hits"] = 1
} else {
  counters["hits"] = count + 1;
}

With getOrPut, it's a single hash lookup:


const gop = try counters.getOrPut("hits");
if (gop.found_existing) {
  gop.value_ptr.* += 1;
} else {
  gop.value_ptr.* = 1;
}

It seems trivial, but the most important thing to understand about getOrPut is that it will set the key for you if the entry has to be created. In our last example, notice that even when gop.found_existing == false, we never set key_ptr - getOrPut automatically sets it to the key we pass in, i.e. "hits".

If we were to put a breakpoint after getOrPut returns but before we set the value, we'd see that our two arrays look something like:


  keys:               values:
       --------          --------
       |      |          |      |
       --------          --------
       | hits  |         | ???? |
       --------          --------
       |      |          |      |
       --------          --------

Where the entry in the keys array is set, but the corresponding entry in values is left undefined. You'll note that getOrPut doesn't take a value. I assume this is because, in some cases, the value might be expensive to derive, so the current API lets us avoid calculating it when gop.found_existing == true.

This is important for keys that need to be owned by the hashmap. Most commonly strings, but this applies to any other key which we'll "manage". Taking a step back, if we wanted to track hits in a hashmap, and, most likely, we wanted the lifetime of the keys to be tied to the hashmap, you'd do something like:


fn register(allocator: Allocator, map: *std.StringHashMap(u32), name: []const u8) !void {
  const owned = try allocator.dupe(u8, name);
  try map.put(owned, 0);
}

Creating your "owned" copy of name, frees the caller from having to maintain name beyond the call to register. Now, if this key is removed, or the entire map cleaned up, we need to free the keys. That's why I like the name "owned", it means the hash map "owns" the key (i.e. is responsible for freeing it):


var it = map.keyIterator();
while (it.next()) |key_ptr| {
  allocator.free(key_ptr.*);
}
map.deinit(allocator);

The interaction between key ownership and getOrPut is worth thinking about. If we try to merge this ownership idea with our incrementing counter code, we might try:


fn hit(allocator: Allocator, map: *std.StringHashMap(u32), name: []const u8) !void {
  const owned = try allocator.dupe(u8, name);
  const gop = try map.getOrPut(owned);
  if (gop.found_existing) {
    gop.value_ptr.* += 1;
  } else {
    gop.value_ptr.* = 1;
  }
}

But this code has a potential memory leak, can you spot it? (see Appendix A for a complete runnable example) When gop.found_existing == true, owned is never used and never freed. One bad option would be to free owned when the entry already exists:


fn hit(allocator: Allocator, map: *std.StringHashMap(u32), name: []const u8) !void {
  const owned = try allocator.dupe(u8, name);
  const gop = try map.getOrPut(owned);
  if (gop.found_existing) {
    // This line was added. But this is a bad solution
    allocator.free(owned);
    gop.value_ptr.* += 1;
  } else {
    gop.value_ptr.* = 1;
  }
}

It works, but we needlessly dupe name if the entry already exists. Rather than prematurely duping the key in case the entry doesn't exist, we want to delay our dupe until we know it's needed. Here's a better option:


fn hit(allocator: Allocator, map: *std.StringHashMap(u32), name: []const u8) !void {
  // we use `name` for the lookup.
  const gop = try map.getOrPut(name);
  if (gop.found_existing) {
    gop.value_ptr.* += 1;
  } else {
    // this line was added
    gop.key_ptr.* = try allocator.dupe(u8, name);
    gop.value_ptr.* = 1;
  }
}

It might seem reckless to pass name into getOrPut. We need the key to remain valid as long as the map entry exists. Aren't we undermining that requirement? Let's walk through the code. When hit is called for a new name, gop.found_existing will be false. getOrPut will insert name in our keys array. This is bad because we have no guarantee that name will be valid for as long as we need it to be. But the problem is immediately remedied when we overwrite key_ptr.*.

On subsequent calls for an existing name, when gop.found_existing == true, the name is only used as a lookup. It's no different than doing a getPtr; name only has to be valid for the call to getOrPut because getOrPut doesn't keep a reference to it when an existing entry is found.

This post was a long way to say: don't be afraid to write to key_ptr.*. Of course you can screw up your map this way. Consider this example:


fn hit(allocator: Allocator, map: *std.StringHashMap(u32), name: []const u8) !void {
    // we use `name` for the lookup.
    const gop = try map.getOrPut(name);
    if (gop.found_existing) {
      gop.value_ptr.* += 1;
    } else {
      // what's this?
      gop.key_ptr.* = "HELLO";
      gop.value_ptr.* = 1;
    }
}

Because the key is used to organize the map - find where items go and where they are, jamming random keys where they don't belong is going to cause issues. But it can also be used correctly and safely, as long as you understand the details.

This code should report a memory leak.


const std = @import("std");
const Allocator = std.mem.Allocator;

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    defer _ = gpa.detectLeaks();

    // I'm using the Unmanaged variant because the Managed ones are likely to
    // be removed (which I think is a mistake). Using Unmanaged makes this
    // snippet more future-proof. I explain unmanaged here:
    // https://www.openmymind.net/Zigs-HashMap-Part-1/#Unmanaged
    var map: std.StringHashMapUnmanaged(u32) = .{};
    try hit(allocator, &map, "teg");
    try hit(allocator, &map, "teg");

    var it = map.keyIterator();
    while (it.next()) |key_ptr| {
      allocator.free(key_ptr.*);
    }
    map.deinit(allocator);
}

fn hit(allocator: Allocator, map: *std.StringHashMapUnmanaged(u32), name: []const u8) !void {
    const owned = try allocator.dupe(u8, name);
    const gop = try map.getOrPut(allocator, owned);
    if (gop.found_existing) {
      gop.value_ptr.* += 1;
    } else {
      gop.value_ptr.* = 1;
    }
}

Leave a comment

Comparing Strings as Integers with @bitCast

2025-02-20T00:00:00Z

In the last blog posts, we looked at different ways to compare strings in Zig. A few posts back, we introduced Zig's @bitCast. As a quick recap, @bitCast lets us force a specific type onto a value. For example, the following prints 1067282596:


const std = @import("std");
pub fn main() !void {
    const f: f32 = 1.23;
    const n: u32 = @bitCast(f);
    std.debug.print("{d}\n", .{n});
}

What's happening here is that Zig represents the 32-bit float value of 1.23 as: [4]u8{164, 112, 157, 63}. This is also how Zig represents the 32-bit unsigned integer value of 1067282596. Data is just bytes; it's the type system - the compiler's knowledge of what data is what type - that controls what and how that data is manipulated.

It might seem like there's something special about bitcasting from a float to an integer; they're both numbers after all. But you can @bitCast from any two equivalently sized types. Can you guess what this prints?:


const std = @import("std");
pub fn main() !void {
    const data = [_]u8{3, 0, 0, 0};
    const x: i32 = @bitCast(data);
    std.debug.print("{d}\n", .{x});
}

The answer is 3. Think about the above snippet a bit more. We're taking an array of bytes and telling the compiler to treat it like an integer. If we made data equal to [_]u8{'b', 'l', 'u', 'e'}, it would still work (and print 1702194274). We're slowly heading towards being able to compare strings as-if they were integers.

From the last post, we could use multiple std.mem.eql or, more simply, std.meta.stringToEnum to complete the following method:


fn parseMethod(value: []const u8) ?Method {
    // ...
}

const Method = enum {
    get,
    put,
    post,
    head,
};

We can also use @bitCast. Let's take it step-by-step.

The first thing we'll need to do is switch on value.len. This is necessary because the three-byte "GET" will need to be @bitCast to a u24, whereas the four-byte "POST" needs to be @bitCast to a u32:


fn parseMethod(value: []const u8) ?Method {
    switch (value.len) {
        3 => switch (@as(u24, @bitCast(value[0..3]))) {
            // TODO
            else => {},
        },
        4 => switch (@as(u32, @bitCast(value[0..4]))) {
            // TODO
            else => {},
        },
        else => {},
    }

    return null;
}

If you try to run this code, you'll get a compilation error: cannot @bitCast from '*const [3]u8'. @bitCast works on actual bits, but when we slice our []const u8 with a compile-time known range ([0..3]), we get a pointer to an array. We can't @bitCast a pointer, we can only @bitCast actual bits of data. For this to work, we need to derefence the pointer, i.e. use: value[0..3].*. This will turn our *const [3]u8 into a const [3]u8.


fn parseMethod(value: []const u8) ?Method {
    switch (value.len) {
        // changed: we now derefernce the value (.*)
        3 => switch (@as(u24, @bitCast(value[0..3].*))) {
            // TODO
            else => {},
        },
        // changed: we now dereference the value (.*)
        4 => switch (@as(u32, @bitCast(value[0..4].*))) {
            // TODO
            else => {},
        },
        else => {},
    }

    return null;
}

Also, you might have noticed the @as(u24, ...) and @as(u32, ...). @bitCast, like most of Zig's builtin functions, infers its return type. When we're assiging the result of a @bitCast to a variable of a known type, i.e: const x: i32 = @bitCast(data);, the return type of i32 is inferred. In the above switch, we aren't assigning the result to a varible. We have to use @as(u24, ...) in order for @bitCast to kknow what it should be casting to (i.e. what its return type should be).

The last thing we need to do is fill our switch blocks. Hopefully it's obvious that we can't just do:


3 => switch (@as(u24, @bitCast(value[0..3].*))) {
    "GET" => return .get,
    "PUT" => return .put,
    else => {},
},
...

But you might be thinking that, while ugly, something like this might work:


3 => switch (@as(u24, @bitCast(value[0..3].*))) {
    @as(u24, @bitCast("GET".*)) => return .get,
    @as(u24, @bitCast("PUT".*)) => return .put,
    else => {},
},
...

Because "GET" and "PUT" are string literals, they're null terminated and of type *const [3:0]u8. When we dereference them, we get a const [3:0]u8. It's close, but it means that the value is 4 bytes ([4]u8{'G', 'E', 'T', 0}) and thus cannot be @bitCast into a u24. This is ugly, but it works:


fn parseMethod(value: []const u8) ?Method {
    switch (value.len) {
        3 => switch (@as(u24, @bitCast(value[0..3].*))) {
            @as(u24, @bitCast(@as([]const u8, "GET")[0..3].*)) => return .get,
            @as(u24, @bitCast(@as([]const u8, "PUT")[0..3].*)) => return .put,
            else => {},
        },
        4 => switch (@as(u32, @bitCast(value[0..4].*))) {
            @as(u32, @bitCast(@as([]const u8, "HEAD")[0..4].*)) => return .head,
            @as(u32, @bitCast(@as([]const u8, "POST")[0..4].*)) => return .post,
            else => {},
        },
        else => {},
    }
    return null;
}

That's a mouthful, so we can add small function to help:


fn parseMethod(value: []const u8) ?Method {
    switch (value.len) {
        3 => switch (@as(u24, @bitCast(value[0..3].*))) {
            asUint(u24, "GET") => return .get,
            asUint(u24, "PUT") => return .put,
            else => {},
        },
        4 => switch (@as(u32, @bitCast(value[0..4].*))) {
            asUint(u32, "HEAD") => return .head,
            asUint(u32, "POST") => return .post,
            else => {},
        },
        else => {},
    }
    return null;
}

pub fn asUint(comptime T: type, comptime string: []const u8) T {
    return @bitCast(string[0..string.len].*);
}

Like the verbose version, the trick is to cast our null-terminated string literal into a string slice, []const u8. By passing it through the asUint function, we get this without needing to add the explicit @as([]const u8).

There is a more advanced version of asUint which doesn't take the uint type parameter (T). If you think about it, the uint type can be inferred from the string's length:


pub fn asUint(comptime string: []const u8) @Type(.{
    .int = .{
        // bits, not bytes, hence * 8
        .bits = string.len * 8,
        .signedness = .unsigned,
    },
}) {
    return @bitCast(string[0..string.len].*);
}

Which allows us to call it with a single parameter: asUint("GET"). This might be your first time seeing such a return type. The @Type builtin is the opposite of @typeInfo. The latter takes a type and returns information on it in the shape of a std.builtin.Type union. Whereas @Type takes the std.builtin.Type and returns an actual usable type. One of these days I'll find the courage to blog about std.builtin.Type!

As a final note, some people dislike the look of this sort of return type and rather encapsulate the logic in its own function. This is the same:


pub fn asUint(comptime string: []const u8) AsUintReturn(string) {
    return @bitCast(string[0..string.len].*);
}

// Remember that, in Zig, by convention, a function should be
// PascalCase if it returns a type (because types are PascalCase).
fn AsUintReturn(comptime string: []const u8) type {
    return @Type(.{
        .int = .{
            // bits, not bytes, hence * 8
            .bits = string.len * 8,
            .signedness = .unsigned,
        },
    });
}

Conclusion

Of the three approaches, this is the least readable and less approachable. Is it worth it? It depends on your input and the values you're comparing against. In my benchmarks, using @bitCast performs roughly the same as std.meta.stringToEnum. But there are some cases where @bitCast can outperform std.meta.stringToEnum by as much as 50%. Perhaps that's the real value of this approach: the performance is less dependent on the input or the values being matched against.

Leave a comment

Switching on Strings in Zig

2025-02-13T00:00:00Z

Newcomers to Zig will quickly learn that you can't switch on a string (i.e. []const u8). The following code gives us the unambiguous error message cannot switch on strings:


switch (color) {
    "red" => {},
    "blue" => {},
    "green" => {},
    "pink" => {},
    else => {},
}

I've seen two explanations for why this isn't supported. The first is that there's ambiguity around string identity. Are two strings only considered equal if they point to the same address? Is a null-terminated string the same as its non-null-terminated counterpart? The other reason is that users of switch [apparently] expect certain optimizations which are not possible with strings (although, presumably, these same users would know that such optimizations aren't possible with string).

Instead, in Zig, there are two common methods for comparing strings.

The most common way to compare strings is using std.mem.eql with if / else if / else:


if (std.mem.eql(u8, color, "red") == true) {

} else if (std.mem.eql(u8, color, "blue") == true) {

} else if (std.mem.eql(u8, color, "green") == true) {

} else if (std.mem.eql(u8, color, "pink") == true) {

} else {

}

The implementation for std.mem.eql depends on what's being compared. Specifically, it has an optimized code path when comparing strings. Although that's what we're interested in, let's look at the non-optimized version:


pub fn eql(comptime T: type, a: []const T, b: []const T) bool {
    if (a.len != b.len) return false;
    if (a.len == 0 or a.ptr == b.ptr) return true;

    for (a, b) |a_elem, b_elem| {
        if (a_elem != b_elem) return false;
    }
    return true;
}

Whether we're dealing with slices of bytes or some other type, if they're of different length, they can't be equal. Once we know that they're the same length, if they point to the same memory, then they must be equal. I'm not a fan of this second check; it might be cheap, but I think it's quite uncommon. Once those initial checks are done, we compare each element (each byte of our string) one at a time.

The optimized version, which is used for strings, is much more involved. But it's fundamentally the same as the above with SIMD to compare multiple bytes at once.

The nature of string comparison means that real-world performance is dependent on the values being compared. We know that if we have 100 if / else if branches then, at the worse case, we'll need to call std.mem.eql 100 times. But comparing strings of different lengths or strings which differ early will be significantly faster. For example, consider these three cases:


{
    const str1 = "a" ** 10_000 ++ "1";
    const str2 = "a" ** 10_000 ++ "2";
    _ = std.mem.eql(u8, str1, str2);
}

{
    const str1 = "1" ++ a" ** 10_000;
    const str2 = "2" ++ a" ** 10_000;
    _ = std.mem.eql(u8, str1, str2);
}

{
    const str1 = "a" ** 999_999;
    const str2 = "a" ** 1_000_000;
    _ = std.mem.eql(u8, str1, str2);
}

For me, the first comparison takes ~270ns, whereas the other two take ~20ns - despite the last one involving much larger strings. The second case is faster because the difference is early in the string allowing the for loop to return after only one comparison. The third case is faster because the strings are of a different length: false is returned by the initial len check.

The std.meta.stringToEnum takes an enum type and a string value and returns the corresponding enum value or null. This code prints "you picked: blue"


const std = @import("std");

const Color = enum {
    red,
    blue,
    green,
    pink,
};

pub fn main() !void {
    const color = std.meta.stringToEnum(Color, "blue") orelse {
        return error.InvalidChoice;
    };

    switch (color) {
        .red => std.debug.print("you picked: red\n", .{}),
        .blue => std.debug.print("you picked: blue\n", .{}),
        .green => std.debug.print("you picked: green\n", .{}),
        .pink => std.debug.print("you picked: pink\n", .{}),
    }
}

If you don't need the enum type (i.e. Color) beyond this check, you can leverage Zig's anonymous types. This is equivalent:


const std = @import("std");

pub fn main() !void {
    const color = std.meta.stringToEnum(enum {
        red,
        blue,
        green,
        pink,
    }, "blue") orelse return error.InvalidChoice;

    switch (color) {
        .red => std.debug.print("you picked: red\n", .{}),
        .blue => std.debug.print("you picked: blue\n", .{}),
        .green => std.debug.print("you picked: green\n", .{}),
        .pink => std.debug.print("you picked: pink\n", .{}),
    }
}

It's not obvious how this should perform versus the straightforward if / else if approach. Yes, we now have a switch statement that the compiler can [hopefully] optimize, but std.meta.stringToEnum still has convert our input, "blue", into an enum.

The implementation of std.meta.stringToEnum depends on the number of possible values, i.e. the number of enum values. Currently, if there are more than 100 values, it'll fallback to using the same if / else if that we explored above. Thus, with more than 100 values it does the if / else if check PLUS the switch. This should improve in the future.

However, with 100 or fewer values, std.meta.stringToEnum creates a comptime std.StaticStringMap which can then be used to lookup the value. std.StaticStringMap isn't something we've looked at before. It's a specialized map that buckets keys by their length. Its advantage over Zig's other hash maps is that it can be constructed at compile-time. For our Color enum, the internal state of a StaticStringMap would look something like:


// keys are ordered by length
keys:     ["red", "blue", "pink", "green"];

// values[N] corresponds to keys[N]
values:   [.red, .blue, .pink, .green];

// What's this though?
indexes:  [0, 0, 0, 0, 1, 3];

It might not be obvious how indexes is used. Let's write our own get implementation, simulating the above StaticStringMap state:


fn get(str: []const u8) ?Color {
    // Simulate the state of the StaticStringMap which
    // stringToMeta built at compile-time.
    const keys = [_][]const u8{"red", "blue", "pink", "green"};
    const values = [_]Color{.red, .blue, .pink, .green};
    const indexes = [_]usize{0, 0, 0, 0, 1, 3};

    if (str.len >= indexes.len) {
        // our map has no strings of this length
        return null;
    }

    var index = indexes[str.len];
    while (index < keys.len) {
        const key = keys[index];

        if (key.len != str.len) {
            // we've gone into the next bucket, everything after
            // this is longer and thus can't be a match
            return null;
        }

        if (std.mem.eql(u8, key, str)) {
            return values[index];
        }
        index += 1;
    }
    return null;
}

Take note that keys are ordered by length. As a naive implementation, we could iterate through the keys until we either find a match or find a key with a longer length. Once we find a key with a longer length, we can stop searching, as all remaining candidates won't match - they'll all be too long. StaticStringMap goes a step further and records the index within keys where entries of a specific length begin. indexes[3] tells us where to start looking for keys with a length of 3 (at index 0). indexes[5] tells us where to start looking for keys with a length of 5 (at index 3).

Above, we fallback to using std.mem.eql for any key which is the same length as our target string. StaticStringMap uses its own "optimized" version:


pub fn defaultEql(a: []const u8, b: []const u8) bool {
    if (a.ptr == b.ptr) return true;
    for (a, b) |a_elem, b_elem| {
        if (a_elem != b_elem) return false;
    }
    return true;
}

This is the same as the simple std.mem.eql implementation, minus the length check. This is done because the eql within our while loop is only ever called for values with matching length. On the flip side, StaticStringMap's eql doesn't use SIMD, so it would be slower for large strings.

In my own benchmarks, in general, I've seen little difference between the two approaches. It does seem like std.meta.stringToEnum is generally as fast or faster. It also results in more concise code and is ideal if the resulting enum is useful beyond the comparison.

You usually don't have long enum values, so the lack of SIMD-optimization isn't a concern. However, if you're considering building your own StaticStringMap at compile time with long keys, you should benchmark with a custom eql function based on std.mem.eql.

We could manually bucket those if / else if branches ourselves, similar to what the StaticStringMap does. Something like:


switch (color.len) {
    3 => {
        if (std.mem.eql(u8, color, "red") == true) {
            // ...
            return;
        }
    },
    4 => {
        if (std.mem.eql(u8, color, "blue") == true) {
            // ...
            return;
        }
        if (std.mem.eql(u8, color, "pink") == true) {
            // ...
            return;
        }
    },
    5 => {
        if (std.mem.eql(u8, color, "green") == true) {
            // ...
            return;
        }
    },
    else => {},
}
// not found

Ughhh. This highlights the convenience of using std.meta.stringToEnum to generate similar code. Also, do remember that std.mem.eql quickly discards strings of different lengths, which helps to explain why both approaches generally perform similarly.

Leave a comment

Using Generics to Inject Stubs when Testing

2025-02-07T00:00:00Z

I have an unapologetic and, I'd like to think, pragmatic take on testing. I want tests to run fast, not be flaky and not be brittle. By "flaky", I mean tests that randomly fail. By "brittle", I mean tests that break due to seemingly unrelated changes in the code.

I'm happy that the definition of "unit tests" has broadened over the years: it's now acceptable to have "unit tests" do more, i.e. hitting a database. I call the period of time where I (we?) over-relied on dependency injection and mocking frameworks, the "dark ages". Still, I do believe that stubs and similar testing tools can be useful in specific cases.

One pattern that I've recently leveraged is using a generic to inject a stub for testing. For example, imagine we have a Client struct which encapsulate communication over the TCP socket:


const Client = struct {
    stream: net.Stream,

    // Prefix the message with its 4-byte length
    pub fn write(self: Client, data: []const u8) !void {
        var header: [4]u8 = undefined;
        std.mem.writeInt(u32, &header, @intCast(data.len), .big);
        try self.stream.writeAll(&header);
        return self.stream.writeAll(data);
    }
};

Admittedly, a unit test for this specific code doesn't make much sense to me. The code itself is trivial, but setting up a test for it wouldn't be straightforward. But what if we did want to test it? Maybe we had a bit more logic with an edge-case that might be hard to reproduce in a broader end-to-end test. Here's how I'd do it:


pub const Client = ClientT(net.Stream);

fn ClientT(comptime S: type) type {
    return struct {
        stream: S,

        const Self = @This();

        // Prefix the message with its 4-byte length
        pub fn write(self: Self, data: []const u8) !void {
            var header: [4]u8 = undefined;
            std.mem.writeInt(u32, &header, @intCast(data.len), .big);
            try self.stream.writeAll(&header);
            return self.stream.writeAll(data);
        }
    };
}

The generic, ClientT, exists only for testing purposes. To shield everything else from this implementation detail, we continue to expose a Client which behaves exactly like our original version, i.e. it's a Client that behaves on a net.Stream.

Now we can test our Client against something other than a net.Stream:


const TestStream = struct {
    written: std.ArrayListUnmanaged(u8) = .{},

    const allocator = testing.allocator;

    fn deinit(self: *TestStream) void {
        self.written.deinit(allocator);
    }

    pub fn writeAll(self: *TestStream, data: []const u8) !void {
        return self.written.appendSlice(allocator, data);
    }
};

Our TestStream records all data written. This will allow our tests to assert that, for a given input (or inputs), the correct data was written. Here's a test:


test "Client: write" {
    var stream = TestStream{};
    defer stream.deinit();

    const client: ClientT(*TestStream) = .{.stream = &stream};

    try client.write("over 9000!");

    // we still need to write this!
    try stream.expectWritten(&.{
        [_]u8{0, 0, 0, 10} ++ "over 9000!",
    });
}

By injecting our TestStream into the client, we're able to control the interaction between the client and the "stream". Our test calls client.write, like normal code would, and, in our tests, this ends up calling our TestStream.writeAll method. To finish off this example, we still need to implement the expectWritten function, which the above test calls:


const TestStream = struct {
    // add this field
    read_pos: usize = 0

    written: std.ArrayListUnmanaged(u8) = .{},

    // ... everything else is the same ...

    // add this method
    fn expectWritten(self: *TestStream, expected: []const []const u8) !void {
        var read_pos = self.read_pos;
        var written = self.written.items[read_pos..];
        for (expected) |e| {
            try testing.expectEqual(true, written.len >= e.len);
            try testing.expectEqualSlices(u8, e, written[read_pos..e.len]);
            read_pos += e.len;
        }
        self.read_pos = read_pos;
    }
};

There are a lot of different ways we can assert that the expected data was written. The above implementation is straightforward, but by adding the read_pos field, we'll be able to incrementally write/assert.

Again, this isn't a pattern that I use often. And there are different ways to achieve the same thing. For example, we could extract the message framing logic (in our case, the 4-byte length header), into its own struct). But I believe every alternative will have some compromises. Therefore it comes down to picking the best option for a given problem, and hopefully this is another tool you can add to your toolbelt.

Leave a comment

In Zig, What's a Writer?

2025-01-28T00:00:00Z

As of mid-July 2025, development on a new Io namespace is underway. If you're using Zig 0.14.1 or earlier, this post is still relevant. I've written a bit about Zig's new Writer.

I find Zig's idea of a "writer" confusing. This is probably because there are three different types, each trying to compensate for compromises made by the others. Let's try to understand what each is and how it fits into a bigger whole.

The first writer that you're likely to run into is a writer: anytype, with the most visible cases found in the std.fmt package, i.e. std.fmt.print.

I've written about anytype before. As a quick recap, think of it as a template. For example, if we have this code:


fn writeN(writer: anytype, data: []cont u8, n: usize) !void {
    var i: usize = 0;
    while (i < n) : (i += 1) {
        try writer.writeAll(data)
    }
}

A copy of this function will be created for every type of writer that is used. Given this invocation:


var logger = MyLogger{};
try writeN(logger, ".", 10);

We'd end up with this copy of writeN:


// anytype -> MyLogger
fn writeN(writer: MyLogger, data: []cont u8, n: usize) !void {
    var i: usize = 0;
    while (i < n) : (i += 1) {
        try writer.writeAll(data)
    }
}

If MyLogger didn't implement the necessary writeAll([]const u8) !void method, we'd get a compiler error - just like we'd expect if we wrote the writeN(writer: MyLogger, ...) function ourselves.

anytype is super useful and has zero runtime overhead. But there are a few downsides. First, it can make binaries larger and compilation slower. In most cases, there's ever only one or maybe a few different types that are used, so, it isn't an issue. Second, it's a documentation black hole. A function that takes a writer: anytype likely expects one or many of the following methods:


write(data: []const u8) !void
writeAll(data: []const u8) !void
writeByte(b: u8) !void
writeByteNTimes(b: u8, n: usize) !void
writeBytesNTimes(data: []const u8, n: usize) !void

But this is just a convention based on the fact that the parameter name is writer. You either have to go through the source code and see how writer is used, or let the compiler tell you which function is expected.

But the main issue with anytype is that it can only be a used as a function parameter. This isn't valid:


const Opts = struct {
    output: anytype
}

For that, we need something else.

The std.io.AnyWriter type is the closest thing Zig has to a writer interface. We've covered Zig interfaces before and AnyWriter is essentially the simplest version we looked at, i.e.:


pub const AnyWriter = struct {
  context: *const anyopaque,
  writeFn: *const fn (context: *const anyopaque, bytes: []const u8) anyerror!usize,

    pub fn write(self: AnyWriter, bytes: []const u8) anyerror!usize {
        return self.writeFn(self.context, bytes);
    }
};

Unlike other languages where interfaces are purely a contract with no implementation, Zig tends to stuff a lot of behavior into its interfaces. For example, AnyWriter implements writeAll which relies on the above write function, and writeByteNTimes which relies on writeAll:


pub fn writeAll(self: AnyWriter, bytes: []const u8) anyerror!void {
    var index: usize = 0;
    while (index != bytes.len) {
        index += try self.write(bytes[index..]);
    }
}

pub fn writeByteNTimes(self: AnyWriter, byte: u8, n: usize) anyerror!void {
    var bytes: [256]u8 = undefined;
    @memset(bytes[0..], byte);

    var remaining: usize = n;
    while (remaining > 0) {
        const to_write = @min(remaining, bytes.len);
        try self.writeAll(bytes[0..to_write]);
        remaining -= to_write;
    }
}

Now this approach can have performance issues, since there's no way for an implementation to provide, for example, an optimized writeByteNTimes. Still, AnyWriter fills that gap around the limitations of anytype's usage.

It would be reasonable to think that when you call file.writer() or array_list.writer(), you're getting an std.io.AnyWriter interface. In reality though, you're getting a std.io.GenericWriter, which std.io.Writer aliases. To understand what this type is, we need to look at the writeFn field of AnyType:


*const fn (context: *const anyopaque, bytes: []const u8) anyerror!usize

Specifically, notice the anyerror return type. Unlike an inferred error type (i.e. !usize) which will implicitly create an errorset based on the function's possible error values, anyerror is an implicitly created errorset for the entire project. This means that even though your specific writer's write function might only be able to return an error.OutOfMemory, the AnyError interface will expose any possible error your program might return. In many cases, that won't be an issue. But projects with strict reliability requirements might need/want to handle every error explicitly, especially when we're talking about something like writing data. Think of a database persisting a WAL file to disk, for example.

Thus we havestd.io.GenericWriter which, as part of its generic contract, takes an error type. Here's what the generic parameters look like:


pub fn GenericWriter(
    comptime Context: type,
    comptime WriteError: type,
    comptime writeFn: fn (context: Context, bytes: []const u8) WriteError!usize,
) type {
    ....
}

Notice that the writeFn's return value is now a typed error - with the type being provided by the implementation.

Let's look at some examples. Here's what an implementation that returns an AnyWriter might look like:


pub const DummyWriterAny = struct {
    fn write(_: *const anyopaque, data: []const u8) error{OutOfMemory}!usize {
        _ = data;
        return error.OutOfMemory;
    }

    pub fn writer(self: *DummyWriterAny) std.io.AnyWriter {
        return .{
            .context = self,
            .writeFn = write,
        };
    }
};

Even though our write function returns an explicit error, that type information is lost when we convert our DummyWriterAny to a AnyWriter. Here's a similar implementation but for a GenericWriter:


pub const DummyWriterGen = struct {
    fn write(_: *DummyWriterGen, data: []const u8) error{OutOfMemory}!usize {
        _ = data;
        return error.OutOfMemory;
    }

    pub const Writer = std.io.Writer(*DummyWriterGen, error{OutOfMemory}, write);

    pub fn writer(self: *DummyWriterGen) Writer {
        return .{.context = self};
    }
};

Now when we convert our DummyWriterGen to an std.io.GenericWriter, the error type is preserved.

However, it's important to realize that GenericWriter isn't just a better, more type-aware, version of AnyWriter. One is a generic the other is an interface. Specifically, a GenericWriter for a File is a different type than a GenericWriter for an ArrayList(u8). It isn't an interface and can't be used like one.

For everyday programming, what all of this means is that if you have a File, ArrayList(u8), Stream or any other type which has a writer method, you're almost certainly getting an GenericWriter. This writer can usually be passed to a function with a writer: anytype:


const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    var arr = std.ArrayList(u8).init(allocator);

    // The first parameter to format is a writer: anytype
    // we can pass our arr.writer() to it.
    try std.fmt.format(arr.writer(), "over {d}!!", .{9000});

    std.debug.print("{s}\n", .{arr.items});
}

I say "usually", because there's no guarantee; it relies on all of us agreeing that a variable named writer of type anytype only ever uses methods available to a GenericWriter. Sarcasm aside, it does mostly work.

For cases where an std.io.AnyWriter is needed, such as storing a implementation-independent writer in a struct field, you'll need to use an AnyWriter, which you can easily get by calling any() on your GenericWriter:


// a slightly dumb example
const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();

    var arr = std.ArrayList(u8).init(allocator);
    const opts = Opts{
        .output = arr.writer().any(),
    };
    try write("hello world", opts);
}

const Opts = struct {
    output: std.io.AnyWriter,
};

fn write(data: []const u8, opts: Opts) !void {
    _ = try opts.output.write(data);
}

If I understand correctly, the motivation for this design, was reducing code bloat while providing a mechanism to preserve typed errors. This is possible because GenericWriter relies on the various methods of AnyWriter, like writeAll and writeByteNTimes. So while there will be many copies of GenericWriter (for File, Stream, ArrayList(u8), etc.), they each have a very small functions which only invoke the AnyReader logic and re-type the error. For example, here's GenericWriter.writeAll:


pub inline fn writeAll(self: Self, bytes: []const u8) Error!void {
    return @errorCast(self.any().writeAll(bytes));
}

We see that @errorCast does the heavy lifting, converting the anyerror that AnyWriter.writeAll returns into the narrow type-specific error for this implementation.

Like I said, for everyday programming, you'll mostly be passing the result of writer() to a writer: anytype function parameter. And it mostly works, possibly after you've wasted time trying to figure out exactly what the requirements for the writer are. It's only when you can't use anytype, i.e. in a structure field, that this GenericWriter / AnyReader chimera becomes something you need to be aware of.

Hopefully the situation can be improved, specifically with some of the performance issues and resulting poor documentation.

Leave a comment

Using SIMD to Tell if the High Bit is Set

2025-01-21T00:00:00Z

One of the first Zig-related blog posts I wrote was an overview of SIMD with Zig. I recently needed to revisit this topic when enhancing my smtp client library. Specifically, SMTP mostly expects printable ASCII characters. Almost all other characters, including UTF-8 text, must be encoded.

I found the various SMTP and MIME RFCs confusing. So I settled on a simple approach: if the high bit of a character is set, I'll base64 encode the value. The simple approach to do detect this is:


fn isHighBitSet(input: []const u8) bool {
    for (input) |c| {
        if (c > 127) {
            return true;
        }
    }
    return false;
}

But with SIMD, we can do this check on multiple bytes at a time. The first thing we have to do is get the ideal size for SIMD operations for our CPU:


if (comptime std.simd.suggestVectorLength(u8)) |vector_len| {
    // TODO
}

As you can tell, suggestVectorLength returns an optional value: some platforms don't support SIMD. On my computer, this returns 16, which mean that I can process 16 bytes at a time. We can extend our skeleton:


if (comptime std.simd.suggestVectorLength(u8)) |vector_len| {
    var remaining = input;
    while (remaining.len > vector_len) {
        const chunk: @Vector(vector_len, u8) = remaining[0..vector_len].*;
        // TODO
        remaining = remaining[vector_len..];
    }
}

Above we're breaking our input into vector_len chunks. The @Vector builtin returns a type (in Zig, by convention, upper-case functions return types). To see if our chunk has a high bit set, we use @reduce:


if (@reduce(.Max, chunk) > 127) {
    return true;
}

Like @Vector, @reduce is one of a handful of SIMD-specific builtins. Its job is to take an std.builtin.ReduceOp and a vector input (.Max and chunk) and return a scalar value. The possible operations depend on the vector type. For example, std.builtin.ReduceOp.And is only valid for a vector of booleans. With Max, we're asking @reduce to return the higher value in the provided vector.

As I said, on my computer, the code will process 16 bytes of data at a time, but our input might not be perfectly divisible by 16. Our while loop exits when remaining.len > vector_len; if our input was 35 bytes long, we'd process 2 chunks (2 * 16) and be left with 3 bytes. These last 3 bytes still need to be checked:


fn isHighBitSet(input: []const u8) bool {
    var remaining = input;
    if (comptime std.simd.suggestVectorLength(u8)) |vector_len| {
        while (remaining.len > vector_len) {
            const chunk: @Vector(vector_len, u8) = remaining[0..vector_len].*;
            if (@reduce(.Max, chunk) > 127) {
                return true;
            }
            remaining = remaining[vector_len..];
        }
    }

     for (remaining) |c| {
        if (c > 127) {
            return true;
        }
    }

    return false;
}

We've made another subtle change: we moved remaining to the outer scope. Our code not only handles chunks that aren't perfectly divisible by vector_len, it also handles cases where suggestVectorLength return null.

Finally, more recent versions of Zig have introduced different backends. Not all of these necessarily support SIMD operations. So, for completeness, we need one more check:


// strange that std.simd doesn't export something like this
const backend_supports_vectors = switch (@import("builtin").zig_backend) {
    .stage2_llvm, .stage2_c => true,
    else => false,
};

fn isHighBitSet(input: []const u8) bool {
    var remaining = input;
    if (comptime backend_supports_vectors) {
        if (comptime std.simd.suggestVectorLength(u8)) |vector_len| {
            while (remaining.len > vector_len) {
                const chunk: @Vector(vector_len, u8) = remaining[0..vector_len].*;
                if (@reduce(.Max, chunk) > 127) {
                    return true;
                }
                remaining = remaining[vector_len..];
            }
        }
    }

     for (remaining) |c| {
        if (c > 127) {
            return true;
        }
    }

    return false;
}

Hopefully this is something that will get cleaned up, but note that it's only really necessary if you're going to use a different backend (I'm using it because the code is in a library, and I don't know how users of my library will compile it).

For very short strings, the SIMD version is the same as the linear version, so there's no performance difference. But for a string of "a" ** 100 (which requires scanning the entire string), the SIMD version is ~2x faster and for a string of "a" ** 1000, it's ~8x faster.

Leave a comment

Peeking Behind Zig Interfaces by Creating a Dummy std.Random Implementation

2025-01-15T00:00:00Z

Zig doesn't have an interface keyword or some simple way to create interfaces. Nonetheless, types like std.mem.Allocator and std.Random are often cited as examples that, with a bit of elbow grease, Zig offers every thing needed to create them.

We've looked at Zig Interfaces in the past. If you want to understand how they work and how to write your own, I recommend you read that post first. But I recently needed to create a dummy std.Random implementation (for testing purposes) and felt that the experience was a nice refresher.

When I think about an interface, I think about a contract with no implementation. But if we look at most types in Zig's standard library which people call an "interface", it's usually something different. Interfaces in Zig have a tendency to expose their own behavior which enhance an underlying implementation's algorithm. For example, the std.io.Reader "interface" has an readAtLeast method. readAtLeast is implemented directly in std.io.Reader. It uses the underlying implementation's read method as part of its implementation. (that underlying implementation could be a file, or a socket, etc).

std.Random is no different and methods like intRangeAtMost are implemented within the std.Random type itself. These method utilize behavior for the underlying implementation. In order to write our own [mock] implementation, we need to know what method(s) std.Random needs us to implement. If you're already comfortable in Zig, you can probably look at the documentation for std.Random and figure it out, although it isn't explicitly stated. You'd see that it has two fields:

ptr: *anyopaque,
fillFn: *const fn (ptr: *anyopaque, buf: []u8) void

and realize that this interface requires a single fill function.

Another way to try to divine the requirements would be to look at an existing implementation. For example, if we look at std.Random.DefaultPrng, we'll be brought to the std.Random.Xoshiro256 type, where we can find the random method. This is the method we call on an implementation to get the an std.Random interface. Just like you call allocator on a GPA to get an std.mem.Allocator. The implementation of random is:


pub fn random(self: *Xoshiro256) std.Random {
    return std.Random.init(self, fill);
}

This tells us that, if we want to create an std.Random, we can use its init function. std.Random.init has the following signature:


pub fn init(
    pointer: anytype,
    comptime fillFn: fn (ptr: @TypeOf(pointer), buf: []u8
) void) Random

Thus, init expects a pointer of any type as well as a function pointer. Knowing this, we can take a stab at writing our dummy implementation:


const std = @import("std");

pub fn main() !void {
    var dr = DummyRandom{};
    var random = dr.random();

    std.debug.print("{d}\n", .{random.int(u8)});
}

const DummyRandom = struct {
    pub fn fill(_: *DummyRandom, buf: []u8) void {
        @memset(buf, 0);
    }

    pub fn random(self: *DummyRandom) std.Random {
        return std.Random.init(self, fill);
    }
};

This code works, but can we make it less verbose? In normal cases, such as with the real Xoshiro256 implementation, the underlying instance exists because it maintains some state (such as a seed). That's why std.Random maintains a pointer to the instance and then passes back to the given fill function. Our implementation is dumb though. Do we really need the DummyRandom structure and an instance of it?

If we try to pass void as our type, and use an anonymous struct, we can tighten up the code:


const std = @import("std");

pub fn main() !void {
    var random = std.Random.init({}, struct {
        pub fn fill(_: void, buf: []u8) void {
            @memset(buf, 0);
        }
    }.fill);

    std.debug.print("{d}\n", .{random.int(u8)});
}

But it won't compile. We get the following error: access of union field 'pointer' while field 'void' is active. Looking at the implementation of std.Random.init we see all of these compile-time check:


pub fn init(pointer: anytype, comptime fillFn: fn (ptr: @TypeOf(pointer), buf: []u8) void) Random {
    const Ptr = @TypeOf(pointer);
    assert(@typeInfo(Ptr) == .pointer); // Must be a pointer
    assert(@typeInfo(Ptr).pointer.size == .One); // Must be a single-item pointer
    assert(@typeInfo(@typeInfo(Ptr).pointer.child) == .@"struct"); // Must point to a struct

Essentially, we must pass a pointer to a structure, e.g. a pointer to a Xoshiro256 or DummyRandom or whatever. From what I can tell, there's no good reason for this restriction. std.Random only uses the provided pointer to pass it back to the provided fill function - it shouldn't care if it's a struct, an integer, or void.

To get around this, we'll need to circumvent init and set the fields directly:


const std = @import("std");

pub fn main() !void {
    var random = std.Random{
        .ptr = {},
        .fillFn = struct {
            pub fn fill(_: *anyopaque, buf: []u8) void {
                @memset(buf, 0);
            }
        }.fill,
    };
    std.debug.print("{d}\n", .{random.int(u8)});
}

This also gives us an error: expected type '*anyopaque', found 'void'. That seems right to me. The ptr field is of type *anyopaque, and we're trying to assign void. We can't just @ptrCast({}), because @ptrCast expects a pointer, but what if we try @ptrCast(&{})?


const std = @import("std");

pub fn main() !void {
    var random = std.Random{
        // added @ptrCast and switch {} to &{}
        .ptr = @ptrCast(&{}),
        .fillFn = struct {
            pub fn fill(_: *anyopaque, buf: []u8) void {
                @memset(buf, 0);
            }
        }.fill,
    };
    std.debug.print("{d}\n", .{random.int(u8)});
}

We get a different error: @ptrCast discards const qualifier. So now our problem is that our void pointer, &{} is a const, but the ptr field is an *anyopaque not an *const anyopque.

Since we're already using @ptrCast, which is always questionable, why not add an even more questionable @constCast?:


const std = @import("std");

pub fn main() !void {
    var random = std.Random{
        // added @constCast
        .ptr = @constCast(@ptrCast(&{})),
        .fillFn = struct {
            pub fn fill(_: *anyopaque, buf: []u8) void {
                @memset(buf, 0);
            }
        }.fill,
    };
    std.debug.print("{d}\n", .{random.int(u8)});
}

This code works. It's safe because our fill implementation never uses it and thus the invalid const discard is never a factor. But it's unsafe because, in theory, std.Random could one day change and use self.ptr itself or assume that it's a pointer to a struct - which is what its init function enforces.

Creating our DummyRandom and going through std.Random.init is safer and the right way. But, creating std.Random directly is more fun.

Leave a comment

Comptime as Configuration

2025-01-10T00:00:00Z

If you look at my httpz library, you'll notice that httpz.Server(T) is a generic. The type passed to Server serves two purposes. The first is to support an application-specific context - whatever instance of T passed into Server(T).init gets passed back to your custom HTTP handlers.

But the other purpose of T is to act as a configuration object. For example, if you want to circumvent most of httpz' request processing, you can define a T.handle method:


const App = struct {
  pub fn handle(app: *App, req: *Request, res: *Response) void {
     // circumvent's httpz routing, middleware, error handling and dispatching
  }
};

This is how Jetzig uses httpz. In my Basic MetaProgramming post, we looked at how a few of Zig's built-in functions and std.meta namespace can help us write this kind of code. For the specific case of the handle override, it looks something like:


if (comptime std.meta.hasFn(Handler, "handle")) {
    if (comptime @typeInfo(@TypeOf(Handler.handle)).@"fn".return_type != void) {
        @compileError(@typeName(Handler) ++ ".handle must return 'void'");
    }
    self.handler.handle(&req, &res);
    return;
}

The return-type check is there to make it clear that the custom handle cannot return an error (or anything else). There are a few different possible overrides in httpz, but they're more or less variations of the above.

More recently, in Zig Template Language, I extended this pattern to include scalar configuration:


const AppTemplate = struct {
    pub const ZtlConfig = struct {
        pub const escape_by_default = true;
        pub const deduplicate_string_literals = true;
    };

    // can also define custom functions
};

To get a specific configuration value, you do:


const Defaults = struct {
    pub const escape_by_default: bool = false;
    pub const deduplicate_string_literals: bool = true;
};

pub fn extract(comptime A: type, comptime field_name: []const u8) @TypeOf(@field(Defaults, field_name)) {
    const App = switch (@typeInfo(A)) {
        .@"struct" => A,
        .pointer => |ptr| ptr.child,
        .void => void,
        else => @compileError("Template App must be a struct, got: " ++ @tagName(@typeInfo(A))),
    };

    if (App != void and @hasDecl(App, "ZtlConfig") and @hasDecl(App.ZtlConfig, field_name)) {
        return @field(App.ZtlConfig, field_name);
    }

    return @field(Defaults, field_name);
}

One reason I went with this approach is that, as with httpz, the type is needed anyways. Like httpz, it's possible to extend the functionality of ztl and add a application-specific context. The obviously downside is that the user of the library has to create a comptime-known configuration.

The benefit of this approach is the opportunity for some optimization. In most cases, that's simply being able to do conditional checks at compile-time rather than runtime. But some optimizations can be a little more meaningful. For example, ztl's VM will use a u8 or a u16 depending on the max_locals configuration, and because max_call_frames is known at compile time, the VM's call stack can be allocated on the stack.

I'm not suggestion that all configuration should be like this. However, if you're building a library and want to provide hooks for users to override or add behavior, I think doing feature detection on a provided T: type is a good approach. Unless you have really good reason to, you probably should not do this for normal options - it makes your library much more rigid by requiring that the user of your library know the options at comptime.

Leave a comment

Zig's @bitCast

2025-01-03T00:00:00Z

In an older post, we explored Zig's @ptrCast and then looked at a concrete usage in the shape of std.heap.MemoryPool.

As a brief recap, when we use @ptrCast, we're telling the compiler to treat a pointer as a given type. For example, this code runs fine:


const std = @import("std");

pub fn main() !void {
    var user = User{.power = 9001, .name = "Goku", .active = true};

    const cat: *Cat = @ptrCast(&user);
    std.debug.print("{any}\n", .{cat});
}

const User = struct {
    power: i64,
    name: []const u8,
    active: bool,
};

const Cat = struct {
    id: i64,
    breed: []const u8,
};

The compiler knows that user is a User, but we're saying "trust me and treat that memory as though it's a Cat". The reason this "works" is because User and Cat have a similar shape. If we swapped the order of Cat's two fields, the code would almost certainly crash. Why? because user.power would become cat.breed.len, causing std.debug.print to try to print a 9001 long string, reaching into memory it does not have access to.

It's possible that you'll never use @ptrCast, but I find it a fascinating example of how the compiler works.

As the name implies, @ptrCast works with pointers, but what if we want to do something for no-pointers? Unfortunately, if we modify the above, removing the pointers and using @bitCast instead of @ptrCast, we get a compile-time error:


pub fn main() !void {
    const user = User{.power = 9001, .name = "Goku"};
    const cat: Cat = @bitCast(user);
    std.debug.print("{any}\n", .{cat});
}

cannot @bitCast to 'Cat'; struct does not have a guaranteed in-memory layout

I don't think there's a technical reason the above doesn't work. It's just that, given Zig structures don't have a guaranteed memory layout, it isn't a good idea and thus is forbidden (although, I can't quite figure out why @ptrCast allows it but @bitCat doesn't!)

This does work though:


pub fn main() !void {
    const n: i64 = 1234567890;
    const f: f64 = @bitCast(n);
    std.debug.print("{}\n", .{f});
}

And prints 6.09957582e-315. If we try this between a boolean an integer, we'll get an error:


pub fn main() !void {
    const b = true;
    const n: i64 = @bitCast(b);
    std.debug.print("{d}\n", .{n});
}

@bitCast size mismatch: destination type 'i64' has 64 bits but source type 'bool' has 1 bits

If we change the type of n from i64 to u1, it works:


pub fn main() !void {
    const b = false;
    const n: u1 = @bitCast(b);
    std.debug.print("{d}\n", .{n});
}

This prints 1. If we changed true to false, we'd get 0.

@bitCast tells the compiler to take a value and treat it as the given type. For example, say that we have the value 1000 as u16. The binary representation for that is 1111101000. If we @bitCast this to an f16, it's the same data, the same binary 1111101000, but interpreted as a float (which is, apparently, 5.96e-5).

Obviously, you generally won't @bitCast from ints to floats or booleans. But it is useful when dealing with binary data. For example, the 4-byte little-endian representation of the number 1000 is [4]const u8{ 232, 3, 0, 0 }. We can use @bitCast to take that binary representation and treat it as an integer:


pub fn main() !void {
    const data = [_]u8{232, 3, 0, 0};
    const n: i32 = @bitCast(data);
    std.debug.print("{}\n", .{n});
}

There are two important things to keep in mind. The first is that this isn't a conversion. There isn't runtime code executing that takes data and turns it into an integer. Rather, this is us telling the compiler to treat the data as an integer.

Secondly, there are various ways to represent any type of data. Above, we said that 1000 is represented as [_]u8{232, 3, 0, 0}, but it could also be represented as [_]u8{0, 0, 3, 232} (e.g. using big-endian) - and there are more than just these 2 ways. So @bitCast only makes sense if you're sure about the bit-representation of the data. Generally, you only want to @bitCast data that comes from your own running program. If you're parsing an external protocol, you should use std.mem.readInt instead. readInt lets you specify the endianness of the data and, interestingly, if the endianness of the data happens the be the same endianness of the platform, it ends up just being a call to @bitCast.

Leave a comment

Basic Awareness in Addition to Deep Understanding

2024-12-26T00:00:00Z

Software developers are often evaluated based on how well they understand specific ideas and tools. While mastery is important, there's another type of knowledge I find myself relying on: vague awareness. Unlike mastery, awareness is merely knowing that something exists along with a basic understanding of what it is and what problem it can solve.

For example, I love regular expressions and, within reason, I'm pretty comfortable with them. But I only have a vague understanding of positive and negative lookahead and lookbehind expressions. I certainly don't know the exact syntax, but I do know it isn't something every regular expression engine supports. Importantly, I know that it has something to do with matching without consuming the match / non-match. Given a problem, I think I understand it well enough to be able to identify these fancy expressions as a possible solution.

Another example is the SQL lead/lag window functions. In fact, when it comes to SQL, there's probably a lot of examples I could pick (lateral joins, recursive CTEs). But I particularly like this example because (a) they're super useful and (b) they remind me that there's a bunch of windowing functions I don't remember.

The list of things I hardly know is long. Bash, systemd, memory mapped files, io_uring, Makefile, DNS, UDP, etc. But when the tools I have mastered either don't apply or aren't well suited to a specific problem, I hopefully know these well enough to jumpstart finding a good solution.

This is one of the reasons I blog. I find that I retain things better when I write about them. You couldn't tell from all the typos and spelling errors, but I'll re-read my blog posts 3-4 times before publishing. That act of writing and reading help me retain the information. It's also a good reference to my future self. More than once I've used my blog to learn something I once understood better. It's probably universally true that the way we explain things to others is the way we want things explained to us.

Leave a comment

Sorting Strings in Zig

2024-12-19T00:00:00Z

First, the code:


std.mem.sort([]const u8, values, {}, stringLessThan);

fn stringLessThan(_: void, lhs: []const u8, rhs: []const u8) bool {
    return std.mem.order(u8, lhs, rhs) == .lt;
}

std.mem.sort takes 4 arguments: the type of value we're sorting, the list of values to sort, an arbitrary context, and a function. The last argument, the function, is what determines how two values should be ordered with respect to each other. Above we're doing a byte-by-byte comparison of two strings. If we wanted to do an case insensitive comparison of ASCII values, we'd replace std.mem.order with std.ascii.orderIgnoreCase.

The third parameter is an application-specific context. This gets passed into our ordering function. It can be anything we want. Oftentimes, as above, we don't need a context, so we pass void ({}). However, imagine we wanted to boost certain values. In other words, we want to sort a list of strings, but have certain values, if present, appear at the front:


var boost = std.StringHashMap(u32).init(allocator);
try boost.put("Keemun", 100);
try boost.put("Silver Needle", 25);

std.mem.sort([]const u8, values, boost, stringLessThan);

fn stringLessThan(boost: std.StringHashMap(u32), lhs: []const u8, rhs: []const u8) bool {
    const lhs_boost = boost.get(lhs) orelse 0;
    const rhs_boost = boost.get(rhs) orelse 0;
    if (lhs_boost > rhs_boost)  {
        return true;
    }
    if (lhs_boost < rhs_boost)  {
        return false;
    }

    return std.mem.order(u8, lhs, rhs) == .lt;
}

This concept of an application-specific context is something you'll often see in Zig libraries (including the standard library). It fills the gap caused by not having easy closures. In the above code, we can't create a closure that captures boost; instead, we create a function and pass boost as an argument. Pretty simple.

You might see the above code written differently:


std.mem.sort([]const u8, values, {}, struct {
    fn lessThan(_: void, lhs: []const u8, rhs: []const u8) bool {
        return std.mem.order(u8, lhs, rhs) == .lt;
    }
}.lessThan);

This might look a bit like a closure, but it isn't. We still need to pass our context in the 3rd parameter of std.mem.sort and accept it as the first parameter of our custom lessThan function. The above code creates an anonymous structure. An anonymous structure is like any other structure, but instead of having an explicit name, we leave it up to the compiler. The compiler will generate something like blog.main__struct_6782. Without an explicit name, we can't really make use of it, except for where it's defined - which is what our above code is doing.

In addition to std.mem.sort, there's also std.mem.sortUnstable. The "stability" of a sort has to do with how equal values are treated. std.mem.sort is a stable sort, which means that if two values are equal, the original order is preserved. When using an unstable sort, there's no guarantee about how equal values are sorted with respect to each other.

For scalar values like strings and integers, sort stability probably doesn't matter. If you have "blue", "red", "green", "blue", you'll end up with "blue", "blue", "green", "red". There isn't anything to distinguish one "blue" from another, so whether or not the sorting function put the "blue" which was originally last at the front of the sorted array won't matter. But imagine you're sorting a User structure based on the user.name, in such cases, you might want to preserve the original value of equal users (but in my experience, even in these cases you probably won't care about it).

Both std.mem.sort and std.mem.sortUnstable are wrappers to functions found in the std.sort namespace. Specifically, std.mem.sort calls std.sort.block and std.mem.sortUnstable calls std.sort.pdq. The std.sort namespace has a few other sorting methods. Here's are basic results from sorting 1000 string values:


std.sort.block
8359 iterations   358870.91ns per iterations
worst: 989333ns   median: 357375ns    stddev: 26749.47ns

std.sort.pdq
11804 iterations  254196.41ns per iterations
worst: 308333ns   median: 253459ns    stddev: 3892.63ns

std.sort.heap
7115 iterations   421603.49ns per iterations
worst: 469083ns   median: 420542ns    stddev: 7155.60ns

std.sort.insertion
693 iterations    4326253.39ns per iterations
worst: 4465917ns  median: 4337208ns   stddev: 166829.84ns

Now, if we cut the list to only 5 values, we'll get different results, with the insertion sort being the fastest. The sorting algorithm you use will depend on the type and size of data you have, and, for some algorithms, how sorted the data already is - and let's not forget whether or not you need a stable sort. Unless you have specific reason not to, std.mem.sortUnstable (which just calls std.sort.pqd) is probably the best default to use.

Finally, you might notice the std.sort.asc and std.sort.desc functions. You can use them when you're trying to sort integers or floats: they are like our stringLessThan function, taking a void context, but use the less than operator (<) to compare two values:


std.mem.sort(i32, values, {}, std.sort.asc(i32));

Leave a comment

Gluing JSON

2024-12-09T00:00:00Z

If I asked you to respond to an HTTP request with a JSON serialize list of products, somewhere in your code, you'd probably have (or whatever the equivalent is in your stack):


body, err := json.Marshal(products)

There's an alternative to this approach that I'm rather fond of: gluing pre-serialized JSON pieces together:


if (len(productJSON)) == 0 {
    return []byte("[]")
}

var buffer bytes.Buffer
buffer.WriteByte('[')
buffer.Write(productJSON[0])
for _, json := range productJSON[1:] {
    buffer.WriteByte(',')
    buffer.Write(json)
}
buffer.WriteByte(']')
return buffer.Bytes()

Rather than serializing an array of products, we're given an array of pre-serialized JSON for each product. We then glue them together to form a valid JSON array. This might seem like a cop-out: something, somewhere is still JSON serializing products in order for our productJSON to exist. So is there really value in in this approach?

For some systems, gluing pre-serialized messages can provide two benefits: performance and flexibility. With respect to performance, we can create and store the serialized version of a product on write - trading write performance and storage space for better read performance. It'll depend on the data and the language being used, but gluing JSON can be anywhere from 2x-10x faster.

That might sound like a weird way to implement a cache. Surely, it would be simpler to use the first approach along with an output cache. It would be simpler, but it wouldn't be suitable in all cases. Pre-serialized messages can offer more flexibility. First of all, cache invalidation and staleness aren't really an issue. Secondly, the individual pieces can be glued together to form different messages.

Say you have a busy store and decide to have pre-serialized JSON messages for each product. Every place where a product is shown, such as a search results, recommendations, past orders, product detail, etc, can use the same pre-serialized JSON message.

If you're concerned about personalization, that is, the JSON representation of a product changes based on the user, I suggest that the definition of a Product should not change. Instead, you should favor something like:


{
    "product": {"id": 9001, ...},
    "last_bought": "2023-01-22T14:26:42.002Z"
    "recommendations": [
    ]
}

Finally, while gluing JSON might be a little ugly and possibly error prone, its well suited to be encapsulated in a library. This is particularly useful if you want to mix and match pre-serialized JSON data with non-serialized data. For example, a library could generated the above:


writer := jsonwriter.New()
writer.RootObject(func() {
  // RawField will write the value as-is, with no escaping or any special encoding
  writer.RawField("product", preSerializedProduct)

  // Field will JSON encode the value
  writer.Field("last_bought", lastBought)

  writer.Array("recommendations", func() {
    for _, rec := range recommendations {
        writer.Raw(rec)
    }
  });
})

Is this something every app should do? No. Is it something people are going to think is pretty hackish? Probably. But until I find a better alternative, I'm going to keep doing it.

Leave a comment