home

String to Integer (atoi) in Go

11 Jan 23

In Go, strconv is the go-to package for various string conversions. When we want to convert a string to an integer, the ParseInt and closely related Atoi functions are the two functions we're most likely to use:

import (
"fmt"
"strconv"
)

func main() {
n, err := strconv.Atoi("9001")
if err != nil {
panic(err)
}
fmt.Println(n)
}

The above code will parse the string "9001" and store the integer value in the variable n (it also prints that integer value, i.e. 9001).

Based on Atoi's documentation which simply states Atoi is equivalent to ParseInt(s, 10, 0), converted to type int, you'd be forgiven for thinking that Atoi is a thin wrapper around ParseInt. But consider this benchmark:

const (
INPUT = "1234567890"
EXPECTED = 1234567890
)

func Benchmark_ParseInt(b *testing.B) {
for n := 0; n < b.N; n++ {
x, _ := strconv.ParseInt(INPUT, 10, 32)
if x != EXPECTED {
panic("invalid")
}
}
}

func Benchmark_Atoi(b *testing.B) {
for n := 0; n < b.N; n++ {
x, _ := strconv.Atoi(INPUT)
if x != EXPECTED {
panic("invalid")
}
}
}

And the results:

Benchmark_ParseInt-8        16.36 ns/op
Benchmark_Atoi-8 6.477 ns/op

The reason Atoi is so much faster is that, for common cases, Atoi will do the conversion itself (in what the inline-documentation calls the "fast path"). This "fast path" is easy to understand and likely what you'd come up with if you had to parse a string yourself:

// not exactly Atoi's fast-path, but simplified for the purpose of this post
var n int
for _, b := range []byte(input) {
b -= '0'
if b > 9 {
return errors.New("invalid integer")
}
n = n*10 + int(b)
}

Go's version also handles negative values (by checking if the first character == '-' and, if so, inverting the final result). ParseInt on the other hand is more flexible and powerful - supporting different bases and underscored integer literals. Thus, Atoi and ParseInt are only "equivalent" in certain cases and in a certain light (i.e. when we don't consider performance).

Both Atoi and ParseInt are similar in anoter way: they will only parse a string if it only contains an integer. This is different than C's atoi function which converts the initial portion of the string. In other words, in Go, Atoi will return an error when given "9000!". However in C, atoi will return 9000.

As far as I can tell, the "recommended" way to parse "9000!" in Go is to use the fmt.Sscan function. But, Sscan is a general-purpose function capable of scanning arbitrarily complex inputs. This power comes at a cost: it's really slow. For our case, where we want to convert the initial portion of a string into an integer (or, put differently, when our integer has a suffix), we can write our own function which is largely based on Go's built-in Atoi logic :

func atoi(input string) (int, string) {
var n int
for i, b := range []byte(input) {
b -= '0'
if b > 9 {
return n, input[i:]
}
n = n*10 + int(b)
}
return n, ""
}

This implementation has the benefit of returning the remainder of the input string that was not a valid integer. This is a pretty common requirement when parsing a string, for example, if we wanted to extract a value with a unit (e.g. 100m).

Now our atoi function is naive. Given a value larger than an integer can hold, our atoi will silently overflow (Go's fmt.Sscan, strconv.Atoi and strconv.ParseInt will return errors). Also, our atoi doesn't support negative values - though that would be easy to add. Still, as a baseline, if we compare it to Sscan using the following:

var cases = []struct {
input string
expected int
remainder string
name string
}{
{"0", 0, "", "zero"},
{"9000!", 9000, "!", "small"},
{"272833319 km", 272833319, " km", "large"},
{"nope", 0, "nope", "invalid"},
}

func Benchmark_Sscan(b *testing.B) {
for _, c := range cases {
b.Run(fmt.Sprintf("sscan_%s", c.name), func(b *testing.B) {
for i := 0; i < b.N; i++ {
var actual int
fmt.Sscan(c.input, &actual)
if int(actual) != c.expected {
panic("invalid")
}
}
})
}
}

func Benchmark_Atoi(b *testing.B) {
for _, c := range cases {
b.Run(fmt.Sprintf("custom_atoi_%s", c.name), func(b *testing.B) {
for i := 0; i < b.N; i++ {
actual, remainder := atoi(c.input)
if actual != c.expected || remainder != c.remainder {
panic("invalid")
}
}
})
}
}

func atoi(input string) (int, string) {
var n int
for i, b := range []byte(input) {
b -= '0'
if b > 9 {
return n, input[i:]
}
n = n*10 + int(b)
}
return n, ""
}

We see a massive performance difference:

Benchmark_Sscan/sscan_zero-8             160.9 ns/op
Benchmark_Atoi/custom_atoi_zero-8 2.355 ns/op

Benchmark_Sscan/sscan_small-8 213.9 ns/op
Benchmark_Atoi/custom_atoi_small-8 4.413 ns/op

Benchmark_Sscan/sscan_large-8 307.0 ns/op
Benchmark_Atoi/custom_atoi_large-8 6.696 ns/op

Benchmark_Sscan/sscan_invalid-8 371.2 ns/op
Benchmark_Atoi/custom_atoi_invalid-8 3.450 ns/op

Maybe there's a better built-in option than fmt.Sscan, but I couldn't find it. Do I think you should write your own string to integer function? Absolutely, if the built-in functions don't do what you like (either in terms of functionality or performance) AND you're willing to live with the edge cases that your implementation is likely fail on (possibly silently, as the above atoi function does with very large integers).