Error Handling in Go - Some Subtleties

Error handling is more complex than it first seems, even in simple programs.

Author

John Bates

Published

September 8, 2023

As is the case in most other programming languages, good error handling in Go is difficult. In this document I record some of the issues that I have had to address in writing even simple programs.

Responsibility

Consider this simple main function that calls ‘readWorkbook’ to read a workbook from disk into a structure. The function returns two values, a populated workbook structure and a Go error. This is ideomatic go - if a function needs to return an error it is normally passed back as the last returned value.

A nil error is conventionally interpreted to mean that the function worked and so the code will normally check first for a nil error before using the other returned values. In our example, if we know that there is no error we can safely use the other values returned by our function: wb.sheets and wb.name.

package main

import (
    "fmt"
    "io"
    "os"
    "strings"
)

func main() {
    wbPath := "./recipies.xlsx"
    wb, err := readWorkbook(wbPath)
    if err != nil {
        fmt.Printf("Error in '%s': %v\n", wbPath, err)
    } else {
        fmt.Printf("Read %d sheets from workbook '%s'\n", len(wb.sheets), wb.name)
    }
}

But there is a problem. If recipies.xlsx does not exist in the current directory we get:

Error in './recipies.xlsx': open ./recipies.xlsx: no such file or directory

The caller has no way of knowing that the called function is going to return the name of the file in ‘err’. Or, rather, in all of its potential error messages.

Our added message is also somewhat misleading. We say that there was an error ‘in’ the file, but the error in this case was in opening the file and had nothing to do with the contents of this file.

All of this suggests that we might be better off pushing the reporting of the filename up into the readWorkbook function.

This is the original definition of readWorkbook.

type workbook struct {
    name   string
    sheets []*sheet
}

type sheet struct {
    name string
    rows []*row
}

type row struct {
    cells []string
}

func readWorkbook(name string) (*workbook, error) {
    file, err := os.Open(name)
    if err != nil {
        return nil, err
    }
    bytes, err := io.ReadAll(file)
    if err != nil {
        return nil, err
    }
    sheets, err := readSheets(bytes)
    if err != nil {
        return nil, err
    }
    return &workbook{name, sheets}, nil
}

We can see that the library function os.Open is able to return the filename in its error message but io.ReadAll and readSheets will know nothing about the filename.

If we push the responsibility for adding the workbook name up into the readWorkbook function we get:

func main() {
    wbPath := "./recipies.xlsx"
    wb, err := readWorkbook(wbPath)
    if err != nil {
        fmt.Println(err)
    } else {
        fmt.Printf("Read %d sheets from workbook '%s'\n", len(wb.sheets), wb.name)
    }
}
...
func readWorkbook(name string) (*workbook, error) {
    fail := func(op string, err error) (*workbook, error) {
        return nil, fmt.Errorf("read work book (%s): %w", op, err)
    }
    file, err := os.Open(name)
    if err != nil {
        return fail("open", err)
    }
    bytes, err := io.ReadAll(file)
    if err != nil {
        return fail("read "+name, err)
    }
    sheets, err := readSheets(bytes)
    if err != nil {
        return fail("sheets "+name, err)
    }
    return &workbook{name, sheets}, nil
}

Which, assuming that readSheets can return us a “bad format” error message, will give us error messages like this:

read work book (open): open ./recipies.xlsx: no such file or directory
read work book (sheets ./recipies.xlsx): bad format

These don’t repeat the filename but do allow us to determine the location and kind of the error as well as passing the low level error detail back.

The ‘%w’ verb in the format string passed to fmt.Errorf in fail causes the original error to be wrapped inside the new error such that it can be unwrapped when the chain of errors is displayed later.

This change has resulted in a slightly messier readWorkbook function but our calling function has been simplified and, by pushing the error message generation as far up the stack as we can, we have eliminated the chance of duplicating information in the error message, simplified the calling code and so reduced the work required at other sites that might call readWorkbook.

In the article Error handling and Go on the Go Blog Andrew Gerrand writes that

“It is the error implementation’s responsibility to summarize the context.”

I interpret that, in this context, as meaning that we should aim to construct the error message as close to the cause of the error as is possible whilst still being able to retain as much of the context as it is required to keep.

Traceability

The Google Style Guide for Go says that;

Error strings should not be capitalized (unless beginning with an exported name, a proper noun or an acronym) and should not end with punctuation. This is because error strings usually appear within other context before being printed to the user.

This context is what helps the reader of the error message understand more about the route through the code that arrived at the error.

From our example error message (above):

read work book (open): open ./recipies.xlsx: no such file or directory

As we read from left to right, we home in on the source of the actual error. Here, we might infer that in attempting to read a workbook, the open operation of ./recipies.xlsx failed with a no such file or directory error from the operating system.

To a user of the software this might be sufficient to help them work out how to resolve the problem. However, a software developer will usually want more traceability. The Go language makes it easy to construct error messages that carry much more detail, but one nice package that does this for us is the errgo package developed by Roger Peppe.

Errgo will trace the chain of error returns from its source to the function in which the details of the error are presented. It does this by wrapping the original error with source file and line number information at each point that the error gets returned by a function. This is best seen by example.

Consider the rest of the workbook example, now modified to use errgo. The fail function in readWorkbook now uses errors.Notef to wrap the underlying error.

func readWorkbook(name string) (*workbook, error) {
    fail := func(op string, err error) (*workbook, error) {
        return nil, errors.Notef(err, nil, "read work book (%s)", op)
    }
    file, err := os.Open(name)
    if err != nil {
        return fail("open", err)
    }
    bytes, err := io.ReadAll(file)
    if err != nil {
        return fail("read "+name, err)
    }
    sheets, err := readSheets(bytes)
    if err != nil {
        return fail("sheets "+name, err)
    }
    return &workbook{name, sheets}, nil
}

func readSheets(bytes []byte) ([]*sheet, error) {
    sr, err := NewSheetReader(bytes)
    if err != nil {
        return nil, errors.Wrap(err)
    }
    sheets := []*sheet{}
    for name, bytes := range sr {
        rows, err := readRows(bytes)
        if err != nil {
            return nil, errors.Wrap(err)
        }
        sheets = append(sheets, &sheet{name, rows})
    }
    return sheets, nil
}

func readRows(bytes []byte) ([]*row, error) {
    rr, err := NewRowReader(bytes)
    if err != nil {
        return nil, errors.Wrap(err)
    }
    rows := []*row{}
    for _, bytes := range rr {
        cells, err := readCells(bytes)
        if err != nil {
            return nil, errors.Wrap(err)
        }
        rows = append(rows, &row{cells})
    }
    return rows, nil
}

func readCells(bytes []byte) ([]string, error) {
    cr, err := NewCellReader(bytes)
    if err != nil {
        return nil, errors.Wrap(err)
    }
    cells := []string{}
    for _, s := range cr {
        cells = append(cells, s)
    }
    return cells, nil
}

// NewSheetReader splits the 'bytes' that make up a workbook
// into a map of bytes that make up each sheet indexed by sheet name.
// This implementation hardwires the worksheet.
func NewSheetReader(bytes []byte) (map[string][]byte, error) {
    sheets := map[string][]byte{
        "Sheet1": []byte("A1,B1/A2,B2,C2/A3"),
        "Sheet2": []byte("A1"),
    }
    return sheets, nil
}

// NewRowReader splits the 'bytes' that make up a sheet
// into a slice of bytes that make up each row.
func NewRowReader(bytes []byte) ([][]byte, error) {
    rowstrings := strings.Split(string(bytes), "/")
    rows := [][]byte{}
    for _, row := range rowstrings {
        rows = append(rows, []byte(row))
    }
    if true { // force an error here
        return nil, errors.New("bad format row")
    }
    return rows, nil
}

// NewCellReader splits the 'bytes' that make up a row
// into a slice of strings that make up each row.
func NewCellReader(bytes []byte) ([]string, error) {
    return strings.Split(string(bytes), ","), nil
}

The readSheets, readRows and readCells functions now call errors.Wrap with an error before passing the result back down the call stack. The effect of this is to have information about the source code file and line number added to the error as it is passed down through each function.

For the purposes of generating an error message we have introduced a fictional “bad format row” error in the NewRowReader function.

If in the code where we display the error message we use

fmt.Printf("%#v", err)

in place of:

fmt.Println(err)

Then rather than our simple error message of:

read work book (sheets ./recipies.xlsx): bad format row

instead we see:

[
        {/Users/john/Projects/GoErrors/main.go:44: read work book (sheets ./recipies.xlsx)}
        {/Users/john/Projects/GoErrors/main.go:71: }
        {/Users/john/Projects/GoErrors/main.go:81: }
        {/Users/john/Projects/GoErrors/main.go:126: bad format row}
]

Which gives us a traceable route through our code starting at line 44 in main.go and progressing through lines 71 and 81 to line 126 which is where the error was generated.

Errgo maintains both the normal error message and a detailed stack trace which can be of great diagnostic use to a developer. The programmer needs to devise a mechanism that prevents a user from seeing the detailed message but which allows the developer access - typically this might be through a log file.

Fragile Dependencies

The fmt.Errorf function from the fmt package in the Go standard library can be used to create a new error that wraps an existing error:

fmt.Errorf("read work book (%s): %w", op, err)

When we do this the format of the new error message is controlled by the format string passed to fmt.Errorf in its first argument. Typically, perhaps ideomatically, the message from the underlying error follows the new error message by a colon and a space, as above.

However, the action of wrapping an error does more than just augment an error message, it also makes the actual underlying error available to any function receiving the new error.

Suppose we make a call to os.Open to open an operating system file. In checking for an error we might either choose to return the error we received from os.Open or we might choose to wrap it in another message and return the wrapped error, as here:

file, err := os.Open(name)
if err != nil {
    err = fmt.Errorf("open file: %w", op, err)
    return err
}

In this situation the calling code has access to the original error, including the name of the file that we attempted to open.

The system library function errors.As can be used to interrogate the stack of wrapped errors to see if it contains errors of a particular kind. For example, if an error occurs in a call to the system library function os.Open then the error that is returned will be of type *os.PathError. No matter how deep the stack of wrapped errors, a call to errors.As can be used to extract the PathError and its values.

var perr *fs.PathError
if errors.As(err, &perr) {
    fmt.Println("err", perr.Path)
}

This might not always be desirable. For example, if you are writing an API you will not want to leak what might be seen as implementation detail out into errors that users of the API might choose to rely on.

The errgo package takes a tougher stance on the construction of the wrapped error stack. If an error is wrapped by the errgo package using, say, either errors.Notef or errors.Wrap as above, then calls to the system library errors.Is and errors.As functions will always return false and so no such interrogation will be possible.

If a function wishes to return information about an error and its cause then it has to be quite specific about how it constructs that error. It can make use of the second argument to errors.Notef to specify which types of error can be exposed as a cause of the error, for example:

fail := func(op string, err error) (*workbook, error) {
    return nil, errors.Notef(
        err,
        errors.Is(fs.ErrNotExist),
        "read work book (%s)",
        op,
    )
}

Alternatively it can make use of the errors.Becausef function to explicitly supply the cause of the error (in the second argument to errors.Becausef).

fail := func(op string, err error) (*workbook, error) {
    return nil, errors.Becausef(
        err,
        err,
        "read work book (%s)",
        op,
    )
}

In this example, as well as wrapping the underlying error and adding our own message, we have supplied the underlying error as the cause of the error. By doing this, we have allowed code lower down the call stack to examine the cause of the error by calling the errors.Cause function.

wb, err := readWorkbook(wbPath)
if err != nil {
    cause := errors.Cause(err)
    if cause != err {
        // do something with the cause.
    }
}

If there has been no explicit cause assigned to the error then errors.Cause(err) will return its argument, otherwise it will return the cause of the error. Errgo allows us to both construct an error chain and also to preserve the original cause of an error.

Only the errors.Note, errors.Notef, errors.Because and errors.Becausef functions preserve the error cause - Errors.Wrap does not preserve the original cause of an error. This arrangement makes it less easy to accidentally pass information down the error chain about an implementation detail that code lower down might become dependent on.