Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Read CSV data with a parsing expression grammar

pest-badge cat-parser-implementations-badge

pest takes a different approach from the combinator libraries: instead of writing parser code, you describe the language in a Parsing Expression Grammar kept in a separate .pest file, and pest_derive generates the parser at compile time. Each named rule becomes a variant of a generated Rule enum.

The grammar below recognises a tiny CSV of numbers:

WHITESPACE = _{ " " | "\t" }

field  = @{ "-"? ~ (ASCII_DIGIT | ".")+ }
record =  { field ~ ("," ~ field)* }
file   =  { SOI ~ (record ~ NEWLINE)* ~ record? ~ EOI }

This recipe feeds input to the derived CsvParser, walks the resulting parse tree with Pairs::into_inner, and sums every field. Because the grammar encodes what is valid, malformed input produces a pest::error::Error that renders a caret pointing at the exact line and column — note the ? operator propagates it like any other error, with no unwrap.

use std::error::Error;

use pest::Parser;
use pest_derive::Parser;

/// `pest` derives a parser from a PEG grammar kept in a separate `.pest`
/// file. Each named rule in `csv.pest` becomes a `Rule` variant, and the
/// grammar — not hand-written code — describes what is valid.
#[derive(Parser)]
#[grammar = "csv.pest"]
struct CsvParser;

/// Parse a tiny CSV of numbers and return their sum. Walking the parse
/// tree is a matter of iterating `into_inner()` over the matched rules.
fn sum_csv(input: &str) -> Result<f64, Box<dyn Error>> {
    let file = CsvParser::parse(Rule::file, input)?
        .next()
        .ok_or("no file rule produced")?;

    let mut total = 0.0;
    for record in file.into_inner() {
        if record.as_rule() == Rule::record {
            for field in record.into_inner() {
                total += field.as_str().parse::<f64>()?;
            }
        }
    }
    Ok(total)
}

fn main() -> Result<(), Box<dyn Error>> {
    let total = sum_csv("1, 2, 3\n4, 5, 6")?;
    println!("sum = {total}");
    assert_eq!(total, 21.0);

    // pest errors carry line/column information and render a caret that
    // points at the offending input.
    if let Err(err) = sum_csv("1, 2\n3, x") {
        println!("{err}");
    }
    Ok(())
}

Parsing 1, 2\n3, x prints a located error:

 --> 2:4
  |
2 | 3, x
  |    ^---
  |
  = expected field