home

Zig's @constCast

Jul 06, 2024

In the Coding in Zig section of my Learning Zig series, an invalid snippet was recently pointed out to me. The relevant part was:

if (try stdin.readUntilDelimiterOrEof(&buf, '\n')) |line| {
  var name = line;
  if (builtin.os.tag == .windows) {
    name = std.mem.trimRight(u8, name, "\r");
  }
  if (name.len == 0) {
    break;
  }
  try lookup.put(name, .{.power = i});
}

The purpose of the code was to look at more cases of dangling pointers: in this example, name is used as a key in our lookup map, but it isn't valid beyond the if block. To show how clever I am, I included code to deal with Windows' different line ending. However, the code failed to compile for Windows.

This issue has nothing to do with Windows, and it has nothing to do with the issue this example was trying to highlight. So let's disentangle and simplify the error case.

We begin with a working function that normalizes an user's name:

fn normalize(name: []u8) void {
  // If we have reason to believe that, at this point in our code,
  // name should never be empty, we should assert it.
  // (If we weren't sure, then we should add an if check)
  std.debug.assert(name.len > 0);

  name[0] = std.ascii.toUpper(name[0]);
}

The function mutates the name in-place, which is possible because the name parameter is declared as a []u8 instead of a const []u8. (This is a valid real-world pattern, but equally common would be to have the normalization process dupe the input and mutate that duped variant, in which cae name could be a []const u8). The next step in our normalization is to trim spaces. This requires changing our normalize function to return a new slice:

fn normalize(name: []u8) []u8 {
  std.debug.assert(name.len > 0);

  name = std.mem.trim(u8, name, " ");
  name[0] = std.ascii.toUpper(name[0]);
  return name;
}

This code doesn't compile because we're trying to assign a value to name, and parameters are always const. So we make a small modification, right?:

var trimmed = std.mem.trim(u8, name, " ");

// this line will cause an error
trimmed[0] = std.ascii.toUpper(trimmed[0]);

return trimmed;

Just like the example in Learning Zig, This code doesn't compile: error: cannot assign to constant on the line which tries to uppercase the first letter. To me, at first, glance, the problem isn't obvious. trimmed is a var and it's a slice into name which, as before, is a []u8 not a const []u8. What gives?

To understand the issue, we need to look at the definition of std.mem.trim:

pub fn trim(comptime T: type, slice: []const T, strip: []const T) []const T

This is a generic function, which is to say it can work on any type. Like in this example, there's a good chance that you'll only ever use trim where T == u8. We can make this code just a little less abstract by imagining the implemention generated for u8:

pub fn trim(slice: []const u8, strip: []const u8) []const u8

Both inputs are of type []const u8, which makes sense. Neither the slice nor strip are mutated (the function returns a sub-slice of slice). Whenever possible, you should make function parameters const. Not only can this result in optimizations, but it makes it so the function can be used with both non-const and const inputs. Because a non-const value can always be cast to const, Zig does it implicitly. By having slice be a []const u8, trim is able to operate on both []u8 and []const u8 values.

Our issue isn't with the input parameters, it's with the return type, which is also []const u8. If we go back to our code, we can see now why Zig refused to compile when we tried to write into trimmed[0]: the value returned by trim is []const u8. Although we declared trimmed as var, this only means we can mutate the slice itself (i.e. we can change its length, or change where it points to). The underlying data is a []const u8, because that's what trim returns.

@constCast

The simplest solution is to use @constCast to strip away the const. This works:

fn normalize(name: []u8) []u8 {
  const trimmed = @constCast(std.mem.trim(u8, name, " "));
  trimmed[0] = std.ascii.toUpper(trimmed[0]);
  return trimmed;
}

@constCast is similar to @ptrCast and @alignCast which I've talked about in Zig Interfaces and Tiptoeing Around @ptrCast. All three are tools to override the compiler. An important part of the compiler's job is to know the type of data and make sure our manipulations of that data is valid. @constCast is probably the simplest of the three. It tells the compiler: I know you think this is a const, but trust me, it isn't. This is dangerous because, if you're wrong, what would be a compile-time bug turns into an undefined behavior.

We can easily see this in action. This code won't compile because Zig knows that name is const and won't let us write to it. String literals are always constants:

pub fn main() !void {
  const name = "leto";
  name[0] = 'L';   // error: cannot assign to constant
  std.debug.print("{s}\n", .{name});
}

Like our trimmed variable, we can try to define name as var, but we'll get the same error. We've made the slice itself mutable (the len and ptr fields), but the underlying data is still const:

pub fn main() !void {
  // const changed to var
  var name = "leto";
  name[0] = 'L';   // error: cannot assign to constant
  std.debug.print("{s}\n", .{name});
}

But this version with @constCast will compile:

pub fn main() !void {
  const name = @constCast("leto");
  name[0] = 'L';
  std.debug.print("{s}\n", .{name});
}

Try to run this code though and it will almost certainly crash. @constCast and its siblings are unsafe and @constCast tends to have fewer legitimate use-case than the others. Some people would say you should never use it. Others, myself included, think the world isn't perfect, libraries aren't perfect (which I'd argue std.mem.trim is a good example of), and it's a useful tool to have. But if you do use it, or its siblings, you must understand the distinction between changing the compiler's perspective and changing reality. @constCast merely changes the compiler's perspective, the reality remains unchanged. If you're wrong, your code will crash.

If we go back to our non-working example, we can see how @constCast is a reasonable solution:

fn normalize(name: []u8) []u8 {
  std.debug.assert(name.len > 0);

  // @constCast added here;
  name = @constCast(std.mem.trim(u8, name, " "));
  name[0] = std.ascii.toUpper(name[0]);
  return name;
}

I say it's "reasonable" because we know name is mutable and we know trim returns a slice into name. I can't think of any future changes to trim which would make this unsafe. I literally can't think of how you'd change trim to make this unsafe, and certainly no reasonable change.

Alternatives?

If we look at trim's signature again:

pub fn trim(comptime T: type, slice: []const T, strip: []const T) []const T

It's tempting to think that we could change the return type to not be const:

pub fn trim(comptime T: type, slice: []const T, strip: []const T) []T

This works in our specific case where the input slice is mutable and thus the return slice can be mutable. But now we've dangerously broken the other case: where the input slice is not mutable. For example, this version of trim would not work in this common case: trim(" Leto ", " ");. We'd end up calling @constCast on data which is a string literal, i.e. a const. As we just saw, that might compile, but it will crash.

This can be solved in Zig, but it isn't trivial and isn't something I'd feel comfortable doing (let alone explaining). The solution might involve using anytype instead of a generic. Something like:

pub fn trim(slice: anytype, strip: ???) @TypeOf(slice)

Now if slice is given as a []const T our return is []const T and if slice is a []T then return is []T. That's promising. However, if we called trim(" Ghanima ", " "), then TypeOf(slice) == *const [9:0]u8, which isn't the return type we want. And, we still don't have a type of strip.

Our solution needs to get more complicated. Something like:

pub fn trim(slice: anytype, strip: TrimStrip(@TypeOf(slice))) TrimReturn(@TypeOf(slice))

Now we can write functions, TrimStrip and TrimReturn, to generate the correct type for strip and our return. These are just normal functions, but they'll be evaluated at comptime (types always have to be known at comptime). In Zig, types and things which return types are, by convention, PascalCase (which is also why the built-in function is @TypeOf instead of @typeOf).

This would be my implementation of TrimReturn:

fn TrimReturn(comptime T: type) type {
  switch (@typeInfo(T)) {
    .Pointer => |ptr| switch (ptr.size) {
      .Slice => return if (ptr.is_const) T else []ptr.child,
      .One => switch (@typeInfo(ptr.child)) {
        .Array => return if (ptr.is_const) []const std.meta.Elem(ptr.child) else []std.meta.Elem(ptr.child),
        else => {},
      },
      else => {},
    },
    else => {},
  }
  @compileError("expected a slice, got: " ++ @typeName(T));
}

Again, this type of comptime programming isn't something I'm confident about. I might be missing cases that should or should not be allowed, or mappings which aren't right. The three empty else cases are for unsupported types - they'll fall through to the @compileError which will cause the compiler to emit an error. The @typeInfo built-in returns a tagged union, currently consisting of 24 possible values (like Int of Fn). Here we're only interested in the Pointer type, which has 4 sub-types based on its size field: Slice, One, Many and C. We rely on the is_const and child fields of the Pointer type to generate the correct return type. The goal, at this point, isn't to provide a working example (sorry, I wish it could be), but rather give some insight into this type of comptime programming.

For completeness, where TrimReturn makes the const-ness of the return type match the const-ness of slice, TrimStrip would make it always const. As such, it would be generally similar to the above, but changed slightly:

fn TrimStrip(comptime T: type) type {
  switch (@typeInfo(T)) {
    .Pointer => |ptr| switch (ptr.size) {
                                                // always const
      .Slice => return if (ptr.is_const) T else []const ptr.child,
      .One => switch (@typeInfo(ptr.child)) {
                         // always const
        .Array => return []const std.meta.Elem(ptr.child),
        else => {},
      },
      else => {},
    },
    else => {},
  }
  @compileError("expected a slice, got: " ++ @typeName(T));
}

Conclusion

While we diverted into a poor introduction to comptime programming, the main goal of this post was to introduce @constCast. To me the interesting part isn't having @constCast as a tool, but rather seeing and being able to interact and change the compiler's perspective. It seems delicate. Not as in frail, because I think runtime bugs caused by compiler bugs are shockingly rare. But as in beautiful. The compiler can infer so much from so little and enforce correctness.

Having escape hatches, whether through functions like @constCast or unsafe blocks found in other languages, can be essential. Be mindful about what those escape hatches are and aren't doing - they aren't changing the data, they're just changing how the compiler treats the data - and remember that you're replacing compile-time safety ith the possibility of undefined behavior at runtime - a trade off no one should want.