Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Parsing content from string

Parse tagged fields from a log line

nom-badge cat-parser-implementations-badge

nom is a parser-combinator library: you build a parser by composing small parsers, each of which consumes part of the input and hands the rest to the next. A parser is any function fn(Input) -> IResult<Input, Output>, where IResult carries either the parsed value together with the unconsumed tail, or an error.

This recipe parses a log line like level=warn line=42 into a struct. Tokens are matched with tag and alternatives with alt; value maps each matched keyword to a Level without a closure. A fallible conversion (string to u32) is wrapped in map_res so a bad number becomes a parse error rather than a panic, and Finish turns the streaming-style result into a plain Result once parsing is complete.

use std::error::Error;

use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::character::complete::{digit1, space1};
use nom::combinator::{map, map_res, value};
use nom::sequence::{preceded, separated_pair};
use nom::{Finish, IResult, Parser};

/// A single log line such as `level=warn line=42` parsed into a struct.
#[derive(Debug, PartialEq)]
struct LogEntry {
    level: Level,
    line: u32,
}

#[derive(Clone, Debug, PartialEq)]
enum Level {
    Info,
    Warn,
    Error,
}

/// `nom` builds a parser by combining small parsers. Each one takes the
/// remaining input and returns the parsed value plus the unconsumed tail,
/// so combinators like `separated_pair` and `preceded` thread that tail
/// through for you. `parse` drives one of these combinators over the input.
///
/// `value` matches a token and yields a fixed value, ignoring the matched
/// text — exactly what mapping each keyword to a `Level` needs.
fn level(input: &str) -> IResult<&str, Level> {
    alt((
        value(Level::Info, tag("info")),
        value(Level::Warn, tag("warn")),
        value(Level::Error, tag("error")),
    ))
    .parse(input)
}

/// `map_res` runs a fallible conversion (`str::parse`) and turns an `Err`
/// into a parse failure instead of panicking.
fn number(input: &str) -> IResult<&str, u32> {
    map_res(digit1, str::parse).parse(input)
}

/// `level=<level> line=<number>`, separated by whitespace.
fn log_entry(input: &str) -> IResult<&str, LogEntry> {
    map(
        separated_pair(
            preceded(tag("level="), level),
            space1,
            preceded(tag("line="), number),
        ),
        |(level, line)| LogEntry { level, line },
    )
    .parse(input)
}

fn main() -> Result<(), Box<dyn Error>> {
    // `Finish` converts the streaming-style result into a plain `Result`;
    // `?` then propagates a parse error instead of panicking. The empty
    // remaining input is discarded.
    let (_, entry) = log_entry("level=warn line=42").finish()?;
    println!("{entry:?}");
    assert_eq!(
        entry,
        LogEntry {
            level: Level::Warn,
            line: 42
        }
    );
    Ok(())
}

Decode a hex color

nom-badge cat-parser-implementations-badge

A second nom parser, this one working at the byte level: it decodes an HTML-style #1b2a3c color literal into its red, green and blue components.

take_while_m_n consumes a fixed number of characters matching a predicate — here exactly two hex digits — and map_res converts each pair into a u8, failing the parse instead of panicking on invalid input. A tuple of parsers, (hex_byte, hex_byte, hex_byte), is itself a parser that runs each in sequence, and preceded discards the leading #. Finish turns the streaming-style result into a plain Result once parsing is complete.

use std::error::Error;

use nom::bytes::complete::take_while_m_n;
use nom::character::complete::char;
use nom::combinator::map_res;
use nom::sequence::preceded;
use nom::{Finish, IResult, Parser};

/// An HTML-style `#1b2a3c` color literal decoded into its red, green and
/// blue components.
#[derive(Debug, PartialEq)]
struct Color {
    red: u8,
    green: u8,
    blue: u8,
}

/// `take_while_m_n` consumes between `m` and `n` characters matching a
/// predicate — here exactly two hex digits — and `map_res` turns them into
/// a byte, failing the parse instead of panicking on bad input.
fn hex_byte(input: &str) -> IResult<&str, u8> {
    map_res(
        take_while_m_n(2, 2, |c: char| c.is_ascii_hexdigit()),
        |hex| u8::from_str_radix(hex, 16),
    )
    .parse(input)
}

/// A leading `#` followed by three hex bytes.
fn color(input: &str) -> IResult<&str, Color> {
    // A tuple of parsers is itself a parser that runs each in sequence.
    let (input, (red, green, blue)) =
        preceded(char('#'), (hex_byte, hex_byte, hex_byte)).parse(input)?;
    Ok((input, Color { red, green, blue }))
}

fn main() -> Result<(), Box<dyn Error>> {
    // `Finish` converts the streaming-style result into a plain `Result`;
    // `?` then propagates a parse error instead of panicking.
    let (_, parsed) = color("#1b2a3c").finish()?;
    println!("{parsed:?}");
    assert_eq!(
        parsed,
        Color {
            red: 0x1b,
            green: 0x2a,
            blue: 0x3c
        }
    );
    Ok(())
}