home

Writing And Using C Code From Go

20 Mar 15

Yesterday's post introduced a specialized integer set which took 1/2 the space as a map[int]struct{} and generally did membership checks (Exists) in 1/2 the time. The next (logical?) step is to see what would happen if we rewrote the code in C and called it from Go. A word of warning: my C isn't that sharp.

The first thing we'll do is add a header file which will contain our IntSet type definition as well as the functions we'll be calling from Go (this goes in a file called intset.h which sits besides our Go code):

#ifndef INTSET_H
#define INTSET_H

#include "vector.h"

#define BUCKET_SIZE 32

typedef struct IntSet{
int mask;
int length;
Vector** buckets;
} IntSet;

IntSet* intset_new(long long size);
void intset_set(IntSet* set, long long value);
int intset_exists(IntSet* set, long long value);
void intset_free(IntSet* set);

#endif

The vector type is just a simple dynamic array implementation which grows by a fixed amount as needed. The key here is that we've defined a type and four functions. How do we call these in Go? It's quite simple: inside of our Go code, we add:

/*
#include "intset.h"
*/

import "C"

It's very important that the import "C" line be placed immediately after the c-style include (and go fmt won't help you with this).

While we still need to implement the C code, the above is the big piece needed to glue Go and C together. With it, we have a package C with a type called C.IntSet and a function called C.intset_new. C isn't the same as other packages, but as far as how it's used, it's very similar. Assuming we had an implementation, we could do:

set := C.intset_new(10000)
defer C.inset_free(set)
fmt.Println(C.intset_exists(set, 492))
C.intset_set(set, 492)
fmt.Println(C.intset_exists(set, 492))

Now, this works nicely because our values are all constants and thus we get the proper typing at compile time. However, if we were dealing with variables, a C int doesn't map directly to an Go int, and a char* certainly doesn't map directly to a string. A more realistic example would look like:

set := C.intset_new(C.longlong(total))
defer C.inset_free(set)
fmt.Println(int(C.intset_exists(set, c.longlong(value))))
...

A lot more type casting. With strings, we'd also be responsible for freeing the generated char*. It'd be nice if we could shield consumers from knowing that our implementation is written in C. Well, since C.IntSet is a normal Go type, we can do:

type IntSet C.IntSet

func NewIntSet(size uint) *IntSet {
return (*IntSet)(C.intset_new(C.longlong(size)))
}

func (s *IntSet) Set(value int) {
C.intset_set((*C.IntSet)(s), C.longlong(value))
}

func (s *IntSet) Len() int {
//notice how we can directly access a structure's field. Yay.
return int(s.length)
}

func (s *IntSet) Exists(value int) bool {
return C.intset_exists((*C.IntSet)(s), C.longlong(value)) == 1
}

func (s *IntSet) Close() {
C.intset_free((*C.IntSet)(s))
}

Hopefully you didn't miss the fact that we're responsible for any memory our C code allocates. Go's garbage collector doesn't know anything about it.

While I haven't shown any implementation, it's essentially the same algorithm as the Go version. How does it compare performance wise?

go-intset populate 50000000        26.4 ns/op
go-intset dense 50000000 24.3 ns/op
go-intset sparse 100000000 22.3 ns/op

go-map populate 10000000 157 ns/op
go-map dense 20000000 71.7 ns/op
go-map sparse 20000000 109 ns/op

c-intset populate 10000000 247 ns/op
c-intset dense 10000000 223 ns/op
c-intset sparse 10000000 237 ns/op

You can see that the original Go code is the fastest, while the built-in map comes in second. The C code is quite a bit slower. Why? Because of the overhead of calling into C. This type of code, where you're doing very small amounts of work, cannot make up for the overhead. However, what happens if instead of testing individual membership we do a basic intersection of two large sets?

go-intset intersect     100    19488104 ns/op
go map intersect 10 127265505 ns/op
c-intset intersect 100 14565680 ns/op

This essentially turns 10000000 call in and out of C into 1 call which then does a lot more work.

I've put all the working code, and benchmarks, in a branch. The readme has basic instructions.