Rust: Structuring and handling errors in 2020

I recently started learning the Rust programming language by going through "the book", which does a phenomenal job of explaining the language basics.

After working through the book’s main content I got started with my first non-trivial, real-world application. But I soon found myself faced with a question I didn’t yet feel well-equipped to handle:

“How should you structure error handling in a mature rust application?”

This article describes my journey of discovering the answer to this question. I will try to explain the pattern I’ve settled on together with example code showing its implementation, in the hope that other newcomers may have an easier time getting started.

Intro

While the book goes through the basics of error handling, including use of the std::Result type and error propagation with the ? operator, it largely glosses over the different patterns for using these tools in real-world applications or the trade-offs involved with different approaches. ¹

When I began looking into best practices, I came across quite a bit of outdated advice to use the failure crate. Failure had a semi-official feel to it as a result of being in the rust-lang-nursery namespace, but it has recently been deprecated.

There have been a number of improvements to the std::error::Error trait in the past two years. ² These have made failure less needed in general and have sparked a number of more modern libraries taking advantage of these improvements to offer better ergonomics.

After reading through quite a lot of historical context and evaluating a number of libraries, I’ve now settled on a (largely library-agnostic) pattern for structuring errors, which I implement using the anyhow and thiserror crates. ³

The rest of this article will:

Introduce a relatively trivial word-counting application to explore and explain the problem space.
Explain why applications and libraries should use different error-handling patterns.
Demonstrate how to apply these patterns using anyhow and thiserror.

Counting words

Let’s introduce some example code for use throughout the rest of this article. We’ll build a program to count the number of words in a text file, much like wc -w would do.

A naive implementation with basic error handling using std::Result might look like this:

use std::env;
use std::error::Error;
use std::fs::File;
use std::io::prelude::*;
use std::io::BufReader;

/// Count the number of words in the given input.
///
/// Any potential errors, such as being unable to read from the input will be propagated
/// upwards as-is due to the use of `line?` just before `split_whitespace()`.
fn count_words<R: Read>(input: &mut R) -> Result<u32, Box<dyn Error>> {
    let reader = BufReader::new(input);
    let mut wordcount = 0;
    for line in reader.lines() {
        for _word in line?.split_whitespace() {
            wordcount += 1;
        }
    }
    Ok(wordcount)
}

fn main() -> Result<(), Box<dyn Error>> {
    for filename in env::args().skip(1).collect::<Vec<String>>() {
        let mut reader = File::open(&filename)?;

        let wordcount = count_words(&mut reader)?;
        println!("{} {}", wordcount, filename);
    }

    Ok(())
}

Let’s generate an input file for our new word counter and try to run it:

$ fortune > words.txt
$ cargo run --quiet -- words.txt
50 words.txt

If you don’t have a words.txt however, you’ll encounter the following error:

$ cargo run --quiet -- words.txt
Error: Os { code: 2, kind: NotFound, message: "No such file or directory" }

This error is the result of File::open() returning an error in main().

To make the example complete, let’s also simulate an error in the read() call happening under the hood inside count_words() ⁴ so we can see what that looks like:

$ cargo run --quiet -- words.txt
Error: Custom { kind: BrokenPipe, error: "read: broken pipe" }

Missing context

So what’s wrong with the above error? While the underlying error cause (“broken pipe”) is made clear, we’re missing a lot of context. We can’t tell which file could not be opened and there is no information about the sequence of events leading up to this error.

When you think about it, there’s a chain of errors here:

main() returns an error because count_words() returns an error.
count_words() returns an error because we run into an error iterating over reader.lines() (lines 14-15).
Iterating over reader.lines() errors because we injected an implementation of std::io::Read that fails on the first call to read().

However, we don’t really see this reflected in the error messages above.

In this example the file name is an input argument for the program itself. This makes it easy to correlate the error to the file it was trying to open.

Now imagine an error happening 5 calls deep inside a library within a much larger piece of software. Without any information on the chain of events in such a case, it quickly gets very difficult to understand what might be causing the error.

Libraries versus applications

Earlier on I mentioned two different libraries, anyhow and thiserror (though both are by the same author, dtolnay). You might be wondering why we need two separate libraries to do something as basic as dealing with errors.

It took me a moment to appreciate this distinction, but there’s value in approaching error handling differently between libraries and applications as they tend to have different concerns:

Libraries should focus on producing meaningful, structured error types/variants. This allows applications to easily differentiate various error cases.
Applications mainly consume errors.
Libraries may want to cast errors from one type to another. An IO error should likely be wrapped by a high-level error type provided by the library.
- Otherwise an IO error in library foo cannot be distinguished from a similar IO error in library bar.
- Not doing so also requires the consumer to know library internals. For example, is it just IO errors that might be returned? What about HTTP errors that might originate from an HTTP client internal to the library?
Libraries must be careful when changing errors or creating new errors, as these can easily introduce breaking changes for consumers. They may produce new errors internally, but these are unlikely to require special structure and can be more easily changed at will.
Where libraries return errors, applications decide if and how those errors are formatted and displayed to users.
Applications may also want to parse and inspect errors, for example to forward them to exception tracking services or to retry operations when doing so is deemed to be safe.

Additionally, and I think this is quite important, libraries should always use std::Result together with an error type implementing std::error::Error in their public APIs. Custom result types like failure::Fail may not compose well with other parts of your user’s code and force them to learn yet another library.

API boundaries

Coming back to our word counting example, imagine we want to make count_words available as a public library. You wouldn’t normally do this for such a small and simple piece of code, but there can be value in making functionality available through public crates within larger projects.

As a demonstration, we can define boundaries in our word counter to separate this code into a library and an application part.

We’ll extract count_words into a library crate named wordcounter. I’ll highlight relevant parts below, but if you want to skip ahead, you can find the complete src/wordcounter.rs on GitHub.

Everything outside of count_words is our application code. This is going to live in a binary crate which we’ll call rwc (for Rust Word Count — very original, I know). The relevant files for this are src/main.rs and src/lib.rs.

The library error type

For our wordcounter library, we’ll define a top-level error type called WordCountError. This enum has error variants for every possible error that our library might encounter.

This is where thiserror comes into play. While we could implement this by hand, thiserror allows us to avoid writing lots of boilerplate code:

use thiserror::Error;

/// WordCountError enumerates all possible errors returned by this library.
#[derive(Error, Debug)]
pub enum WordCountError {
    /// Represents an empty source. For example, an empty text file being given
    /// as input to `count_words()`.
    #[error("Source contains no data")]
    EmptySource,

    /// Represents a failure to read from input.
    #[error("Read error")]
    ReadError { source: std::io::Error },

    /// Represents all other cases of `std::io::Error`.
    #[error(transparent)]
    IOError(#[from] std::io::Error),
}

(Quoting the official documentation: “Thiserror deliberately does not appear in your public API. You get the same thing as if you had written an implementation of std::error::Error by hand, and switching from handwritten impls to thiserror or vice versa is not a breaking change.”)

With this error type, we can now change the signature of count_words as follows:

fn count_words<R: Read>(input: &mut R) -> Result<u32, WordCountError> { /* .. */ }

Remember previously, the signature looked like this:

fn count_words<R: Read>(input: &mut R) -> Result<u32, Box<dyn Error>> { /* .. */ }

Compared to the previous version, our new code is a lot more specific. Users now get a lot more insight into the possible error cases that might be returned. As an added benefit, we also no longer have to Box Error because the size of WordCountError can be determined at compile time.

Returning library errors

In WordCountError above we specify three possible types of error.

EmptySource may be considered an error related to our business domain. We can return this from our count_words function using the following code:

if wordcount == 0 {
    return Err(WordCountError::EmptySource);
}

ReadError is an example of wrapping a lower-level error into our high-level library error. This is used to return a meaningful error for read errors and can be seen here:

for line in reader.lines() {
    let line = line.map_err(|source| WordCountError::ReadError { source })?;
    for _word in line.split_whitespace() {
        wordcount += 1;
    }
}

The most interesting code in the snippet above is found on line 2, which contains line.map_err(|source| WordCountError::ReadError { source })?;. There’s quite a lot going on here though, so let’s unpack this step by step:

We iterate over the lines from reader, which get returned as io::Result<String> because read operations can fail.
If the result is of the Err variant, our use of map_err() transforms the error value embedded inside this result from an io::Error into a WordCountError::ReadError. If the result is of the Ok variant, it remains unchanged.
We then unpack the result with the ? operator. If it was of the Ok variant then this is assigned to the variable line. If it was of the Err variant, the function exits here, returning this as the return value (remember the return type is Result<u32, WordCountError>).

Because we encapsulate io::Error under the source attribute of WordCountError::ReadError, our context/error chain remains intact. This ensures anyhow, which we’ll use on the application side of things below, ends up displaying both errors ⁵.

Transparent forwarding

At this point, it’s worth noting errors may use error(transparent) to forward the source and Display methods straight through to an underlying error without adding an additional message. This can be seen in the WordCountError::IOError case which acts as a “catch-all” variant for all other IO errors.

If we didn’t care for the specialized WordCountError::ReadError variant, this means we could have also written our code as follows, in which case we no longer need to use map_err() and can use ? directly:

for line in reader.lines() {
    for _word in line?.split_whitespace() {
        wordcount += 1;
    }
}

With this pattern, we avoid adding additional error wrapping code while still transforming errors into our high-level WordCountError in order to keep our public API clean.

Application errors

With the API above in place, we can adjust the rest of our code to handle application-level concerns like argument parsing and invocation of wordcounter::count_words.

Using anyhow, we can then end up with this main function:

// Some `use` statements have been omitted here for brevity
use anyhow::{Context, Result};

fn main() -> Result<()> {
    for filename in env::args().skip(1).collect::<Vec<String>>() {
        let mut reader = File::open(&filename).context(format!("unable to open '{}'", filename))?;
        let wordcount =
            count_words(&mut reader).context(format!("unable to count words in '{}'", filename))?;
        println!("{} {}", wordcount, filename);
    }
    Ok(())
}

This has resulted in a couple of changes.

1. Simplified Result type

Instead of having to create custom error types or using std::Result<T, Box<dyn Error>> everywhere, we can use anyhow::Result as a more convenient type with less boilerplate.

In the case of main() above, this allows us to directly return anyhow::Result<()>. It feels like a minor thing, but I find that being able to focus only on the success data type without having to annotate additional error types adds a lot of clarity here.

2. Annotating errors

The anyhow::Context trait, which we brought in via use anyhow::Context above, enables a context() method on Result types. This lets us wrap/annotate errors with more information in a way that is more ergonomic to write than the map_err approach used in the library code:

let mut reader = File::open(&filename)
    .context(format!("unable to open '{}'", filename))?;

let wordcount = count_words(&mut reader)
    .context(format!("unable to count words in '{}'", filename))?;

This provides valuable information to the user of the application about what was being attempted should an error occur. With these calls in place our errors will now be displayed as follows:

$ cargo run --quiet -- words.txt
Error: unable to open 'words.txt'

Caused by:
    No such file or directory (os error 2)

$ cargo run --quiet -- words.txt
Error: unable to count words in 'words.txt'

Caused by:
    0: Error encountered while reading from input
    1: read: broken pipe

In both cases our error message now includes the name of the file we were working with. We also describe what high-level operation was being attempted when the problem occurred.

3. Error display

You’ll notice we didn’t have to write any extra error formatting code to get these nice error displays. All we had to do is change the return type of main into anyhow’s Result type.

It isn’t necessary to rely on this implicit behavior of returning a Result from main. We could choose to move all of our code into a run function instead and then write main as follows:

fn main() {
    if let Err(err) = wordcount::run() {
        eprintln!("Error: {:?}", err);
        std::process::exit(1);
    }
}

This will result in exactly the same output.

One advantage of this approach (beyond having more control about how our program exits, such as via different exit code) is that it allows us to change how the message is formatted.

For example, if we use eprintln!("{:#?}", err) instead (note the {:#?} vs {:?}), we’ll get a struct-style representation:

$ cargo run --quiet -- words.txt
Error {
    context: "unable to count words in \'words.txts\'",
    source: ReadError {
        source: Custom {
            kind: BrokenPipe,
            error: "read: broken pipe",
        },
    },
}

(The various different options are documented under anyhow’s Display representations.)

Backtraces

So far we haven’t talked about backtraces, which are a common tool to use when debugging complex issues.

Anyhow also allows us to capture and display a backtrace when an error happens. At the moment, support for backtraces is only available on nightly Rust though, as the std::backtrace module is currently a nightly-only experimental API.

When using the nightly channel, setting RUST_BACKTRACE appropriately will enable backtraces:

$ RUST_BACKTRACE=1 cargo run --quiet -- words.txt
Error: unable to count words in 'words.txt'

Caused by:
    0: Error encountered while reading from input
    1: read: broken pipe

   0: <E as anyhow::context::ext::StdError>::ext_context
             at /home/zoni/.cargo/registry/src/github.com-1ecc6299db9ec823/anyhow-1.0.28/src/backtrace.rs:26
   1: anyhow::context::<impl anyhow::Context<T,E> for core::result::Result<T,E>>::context::{{closure}}
             at /home/zoni/.cargo/registry/src/github.com-1ecc6299db9ec823/anyhow-1.0.28/src/context.rs:50
   2: core::result::Result<T,E>::map_err
             at /rustc/2454a68cfbb63aa7b8e09fe05114d5f98b2f9740/src/libcore/result.rs:612
   3: anyhow::context::<impl anyhow::Context<T,E> for core::result::Result<T,E>>::context
             at /home/zoni/.cargo/registry/src/github.com-1ecc6299db9ec823/anyhow-1.0.28/src/context.rs:50
   4: wordcount::run
             at src/lib.rs:58
   5: rwc::main
             at src/main.rs:9
   6: std::rt::lang_start::{{closure}}
             at /rustc/2454a68cfbb63aa7b8e09fe05114d5f98b2f9740/src/libstd/rt.rs:67
   7: std::rt::lang_start_internal::{{closure}}
             at src/libstd/rt.rs:52
      std::panicking::try::do_call
             at src/libstd/panicking.rs:297
      std::panicking::try
             at src/libstd/panicking.rs:274
      std::panic::catch_unwind
             at src/libstd/panic.rs:394
      std::rt::lang_start_internal
             at src/libstd/rt.rs:51
   8: std::rt::lang_start
             at /rustc/2454a68cfbb63aa7b8e09fe05114d5f98b2f9740/src/libstd/rt.rs:67
   9: main
  10: __libc_start_main
  11: _start

I generally find Rust’s backtraces too cryptic and confusing to be of much help so their lack of support on the stable channel hasn’t been a problem for me personally. Having the chain of errors displayed by anyhow has been more than sufficient for me so far.

Conclusion

This is not the end of Rust’s error story. Changes are still underway and it remains to be seen whether these two libraries are going to remain as favored as they are today.

One thing is for sure though: The story for error handling has come a long way and with the current state of Rust, you can write very robust software in a pleasant and practical manner.

I hope you found this article useful. If you did, please consider sending a quick thank you note, either through email or via a tweet to @NickGroenen.

Feedback and conversations

There’s a bit of discussion happening on Reddit and one comment posted there seems worth including here. u/Yaahallo writes:

I think the bit about error handling being different depending on if you’re writing a library vs an application is simplification that’s common in the rust community but also a source of confusion.

The reasons for using anyhow vs thiserror aren’t really based on if it’s a library or an application, it’s actually about whether or not you need to handle errors or report them.

Libraries often want to support as many error handling use cases for their consumers as possible. This ends up meaning that they want to export error types that are both handleable (aka an enum) and reportable (aka implements std::error::Error).

Applications on the other hand often end up doing the error handling or reporting. For handling you don’t need a library usually, you just use match. For reporting you do need an error type, or more accurately an error reporting type, which is exactly what anyhow::Error is designed to do.

Burntsushi (from ripgrep fame) agrees with a lot of my points but also challenges the use of proc-macro based libraries like thiserror for certain use cases, primarily due to the increase in compilation times that result from their use. To add to his point, he shows us how to write the WordCountError implementation from this article by hand.

And there’s also an interesting thread regarding the performance implications of using context() versus with_context().

This is not meant to be taken as criticism. Introducing the entire problem space and all of the different considerations to handling errors does not feel like a good fit for the purpose of the book. That being said, from a newcomer perspective, more official guidance on this topic somewhere that is easily discoverable and accessible would be nice to have. ↩︎
These include improvements around Display, a proper source method and support for a backtrace API. See RFC 2504 and its associated tracking issue for details. ↩︎
I can highly recommend reading Error Handling Survey by Yoshua Wuyts for an overview of various alternatives. ↩︎
count_words() takes as parameter any type which implements the trait std::io::Read. Implementors of the Read trait are called readers. Readers are defined by one required method, read(). ↩︎
Because this uses the source() method on std::error::Error it’s not specific to anyhow. It will work with any library supporting RFC 2504. ↩︎