Writing And Using C Code From Go

Mar 20, 2015

Yesterday's post introduced a specialized integer set which took 1/2 the space as a map[int]struct{} and generally did membership checks (Exists) in 1/2 the time. The next (logical?) step is to see what would happen if we rewrote the code in C and called it from Go. A word of warning: my C isn't that sharp.

The first thing we'll do is add a header file which will contain our IntSet type definition as well as the functions we'll be calling from Go (this goes in a file called intset.h which sits besides our Go code):

#ifndef INTSET_H
#define INTSET_H

#include "vector.h"

#define BUCKET_SIZE 32

typedef struct IntSet{
  int mask;
  int length;
  Vector** buckets;
} IntSet;

IntSet* intset_new(long long size);
void intset_set(IntSet* set, long long value);
int intset_exists(IntSet* set, long long value);
void intset_free(IntSet* set);


The vector type is just a simple dynamic array implementation which grows by a fixed amount as needed. The key here is that we've defined a type and four functions. How do we call these in Go? It's quite simple: inside of our Go code, we add:

#include "intset.h"
import "C"

It's very important that the import "C" line be placed immediately after the c-style include (and go fmt won't help you with this).

While we still need to implement the C code, the above is the big piece needed to glue Go and C together. With it, we have a package C with a type called C.IntSet and a function called C.intset_new. C isn't the same as other packages, but as far as how it's used, it's very similar. Assuming we had an implementation, we could do:

set := C.intset_new(10000)
defer C.inset_free(set)
fmt.Println(C.intset_exists(set, 492))
C.intset_set(set, 492)
fmt.Println(C.intset_exists(set, 492))

Now, this works nicely because our values are all constants and thus we get the proper typing at compile time. However, if we were dealing with variables, a C int doesn't map directly to an Go int, and a char* certainly doesn't map directly to a string. A more realistic example would look like:

set := C.intset_new(C.longlong(total))
defer C.inset_free(set)
fmt.Println(int(C.intset_exists(set, c.longlong(value))))

A lot more type casting. With strings, we'd also be responsible for freeing the generated char*. It'd be nice if we could shield consumers from knowing that our implementation is written in C. Well, since C.IntSet is a normal Go type, we can do:

type IntSet C.IntSet

func NewIntSet(size uint) *IntSet {
  return (*IntSet)(C.intset_new(C.longlong(size)))

func (s *IntSet) Set(value int) {
  C.intset_set((*C.IntSet)(s), C.longlong(value))

func (s *IntSet) Len() int {
  //notice how we can directly access a structure's field. Yay.
  return int(s.length)

func (s *IntSet) Exists(value int) bool {
  return C.intset_exists((*C.IntSet)(s), C.longlong(value)) == 1

func (s *IntSet) Close() {

Hopefully you didn't miss the fact that we're responsible for any memory our C code allocates. Go's garbage collector doesn't know anything about it.

While I haven't shown any implementation, it's essentially the same algorithm as the Go version. How does it compare performance wise?

go-intset populate 50000000        26.4 ns/op
go-intset dense    50000000        24.3 ns/op
go-intset sparse   100000000       22.3 ns/op

go-map populate    10000000        157 ns/op
go-map dense       20000000        71.7 ns/op
go-map sparse      20000000        109 ns/op

c-intset populate  10000000         247 ns/op
c-intset dense     10000000         223 ns/op
c-intset sparse    10000000         237 ns/op

You can see that the original Go code is the fastest, while the built-in map comes in second. The C code is quite a bit slower. Why? Because of the overhead of calling into C. This type of code, where you're doing very small amounts of work, cannot make up for the overhead. However, what happens if instead of testing individual membership we do a basic intersection of two large sets?

go-intset intersect     100    19488104 ns/op
go map intersect        10    127265505 ns/op
c-intset intersect      100    14565680 ns/op

This essentially turns 10000000 call in and out of C into 1 call which then does a lot more work.

I've put all the working code, and benchmarks, in a branch. The readme has basic instructions.