Read CSV data with a parsing expression grammar
pest takes a different approach from the combinator libraries: instead
of writing parser code, you describe the language in a Parsing Expression
Grammar kept in a separate .pest file, and pest_derive generates the
parser at compile time. Each named rule becomes a variant of a generated Rule
enum.
The grammar below recognises a tiny CSV of numbers:
WHITESPACE = _{ " " | "\t" }
field = @{ "-"? ~ (ASCII_DIGIT | ".")+ }
record = { field ~ ("," ~ field)* }
file = { SOI ~ (record ~ NEWLINE)* ~ record? ~ EOI }
This recipe feeds input to the derived CsvParser, walks the resulting parse
tree with Pairs::into_inner, and sums every field. Because the grammar
encodes what is valid, malformed input produces a pest::error::Error that
renders a caret pointing at the exact line and column — note the ? operator
propagates it like any other error, with no unwrap.
use std::error::Error;
use pest::Parser;
use pest_derive::Parser;
/// `pest` derives a parser from a PEG grammar kept in a separate `.pest`
/// file. Each named rule in `csv.pest` becomes a `Rule` variant, and the
/// grammar — not hand-written code — describes what is valid.
#[derive(Parser)]
#[grammar = "csv.pest"]
struct CsvParser;
/// Parse a tiny CSV of numbers and return their sum. Walking the parse
/// tree is a matter of iterating `into_inner()` over the matched rules.
fn sum_csv(input: &str) -> Result<f64, Box<dyn Error>> {
let file = CsvParser::parse(Rule::file, input)?
.next()
.ok_or("no file rule produced")?;
let mut total = 0.0;
for record in file.into_inner() {
if record.as_rule() == Rule::record {
for field in record.into_inner() {
total += field.as_str().parse::<f64>()?;
}
}
}
Ok(total)
}
fn main() -> Result<(), Box<dyn Error>> {
let total = sum_csv("1, 2, 3\n4, 5, 6")?;
println!("sum = {total}");
assert_eq!(total, 21.0);
// pest errors carry line/column information and render a caret that
// points at the offending input.
if let Err(err) = sum_csv("1, 2\n3, x") {
println!("{err}");
}
Ok(())
}
Parsing 1, 2\n3, x prints a located error:
--> 2:4
|
2 | 3, x
| ^---
|
= expected field