Demystifying Pratt Parsers: A Powerful Approach to Expression Parsing

Expression parsing is one of the more challenging aspects of building a compiler or interpreter, particularly when handling complex syntax rules with different operator precedence levels. Traditional approaches, like recursive descent, often require writing separate functions for each level of precedence, making the code verbose and difficult to maintain.

However, there exists a lesser-known but highly effective alternative: Pratt parsing, also known as top-down operator precedence parsing. Originally introduced by Vaughan Pratt in the 1970s, this method offers a streamlined and intuitive approach to parsing expressions. Despite its efficiency, it remains underutilized compared to other parsing techniques.

In this article, we explore what makes Pratt parsers so powerful, how they simplify expression parsing, and how to implement one from scratch.


Why Traditional Parsing Struggles with Expressions

In languages with infix, prefix, and postfix operators—such as JavaScript, Python, and C++—parsing expressions can become complex. Consider an expression like:

a + b * c - d / e

Each operator has precedence rules (multiplication and division take priority over addition and subtraction) and associativity rules (left-to-right for most binary operators).

Traditional recursive descent parsers require defining multiple parsing functions, each dedicated to a different precedence level. This approach:

  1. Creates excessive code repetition
  2. Makes it difficult to modify operator rules
  3. Obscures the actual grammar rules within parsing logic

For example, parsing an expression in a recursive descent parser might require separate functions for:

  • Multiplication and division (parseMultiplication())
  • Addition and subtraction (parseAddition())
  • Unary operators (parseUnary())

This results in hard-to-read, hard-to-maintain code.


The Pratt Parsing Approach

Key Idea: Operator Precedence is Handled Dynamically

Pratt parsing eliminates the need for multiple functions for different precedence levels. Instead, it processes expressions dynamically based on operator precedence, using a single core parsing function.

It achieves this by:

  1. Defining parsing behaviors in a table instead of hardcoded functions.
  2. Using token-driven parsing, where operators dictate how they should be parsed based on precedence.
  3. Handling prefix, infix, and postfix operators in a unified manner.

Unlike recursive descent, Pratt parsing allows the grammar to be represented declaratively, making it much easier to extend.


Implementing a Pratt Parser

Let’s walk through how to build a Pratt parser for a simple expression language.

1. Tokenizing the Input

Before parsing, we need to tokenize the input into meaningful components. A token consists of:

  • A type (e.g., NUMBER, PLUS, MINUS)
  • A value (e.g., 42, "+")

Example input:

3 + 5 * (2 - 1)

Would be tokenized as:

[
{"type": "NUMBER", "value": "3"},
{"type": "PLUS", "value": "+"},
{"type": "NUMBER", "value": "5"},
{"type": "MULTIPLY", "value": "*"},
{"type": "LEFT_PAREN", "value": "("},
{"type": "NUMBER", "value": "2"},
{"type": "MINUS", "value": "-"},
{"type": "NUMBER", "value": "1"},
{"type": "RIGHT_PAREN", "value": ")"}
]

2. Setting Up the Parser

A Pratt parser consists of two main components:

  1. Prefix Parselets – Handle expressions that start with a token (like numbers or unary operators).
  2. Infix Parselets – Handle expressions that appear between two operands (like + and *).

Defining the Parser Structure

We start by defining a base class for expressions:

abstract class Expression {}

Token Management

The parser must keep track of tokens and consume them as needed:

class Parser {
private final List<Token> tokens;
private int position = 0;

public Parser(List<Token> tokens) {
this.tokens = tokens;
}

private Token peek() {
return position < tokens.size() ? tokens.get(position) : null;
}

private Token consume() {
return tokens.get(position++);
}
}

3. Defining Prefix and Infix Parselets

A prefix parselet handles expressions like numbers (3), variables (x), and unary operations (-a).

interface PrefixParselet {
Expression parse(Parser parser, Token token);
}

class NumberParselet implements PrefixParselet {
public Expression parse(Parser parser, Token token) {
return new NumberExpression(Integer.parseInt(token.getValue()));
}
}

An infix parselet handles binary operators like + and *.

interface InfixParselet {
Expression parse(Parser parser, Expression left, Token token);
int getPrecedence();
}

class BinaryOperatorParselet implements InfixParselet {
private final int precedence;

public BinaryOperatorParselet(int precedence) {
this.precedence = precedence;
}

public Expression parse(Parser parser, Expression left, Token token) {
Expression right = parser.parseExpression(precedence);
return new BinaryExpression(left, token, right);
}

public int getPrecedence() {
return precedence;
}
}

4. Implementing the Core Parsing Algorithm

class Parser {
private final Map<TokenType, PrefixParselet> prefixParselets = new HashMap<>();
private final Map<TokenType, InfixParselet> infixParselets = new HashMap<>();

public Expression parseExpression(int precedence) {
Token token = consume();
PrefixParselet prefix = prefixParselets.get(token.getType());

if (prefix == null) {
throw new ParseException("Unexpected token: " + token);
}

Expression left = prefix.parse(this, token);

while (precedence < getPrecedence()) {
token = consume();
InfixParselet infix = infixParselets.get(token.getType());
left = infix.parse(this, left, token);
}

return left;
}

private int getPrecedence() {
InfixParselet parser = infixParselets.get(peek().getType());
return (parser != null) ? parser.getPrecedence() : 0;
}

public void register(TokenType type, PrefixParselet prefix) {
prefixParselets.put(type, prefix);
}

public void register(TokenType type, InfixParselet infix) {
infixParselets.put(type, infix);
}
}

5. Defining Operator Precedence

To handle operator precedence correctly, we define a precedence table:

class Precedence {
public static final int LOWEST = 1;
public static final int SUM = 2; // +, -
public static final int PRODUCT = 3; // *, /
public static final int PREFIX = 4; // -a, +a
}

Then, we register operators with appropriate precedence:

parser.register(TokenType.NUMBER, new NumberParselet());
parser.register(TokenType.PLUS, new BinaryOperatorParselet(Precedence.SUM));
parser.register(TokenType.MINUS, new BinaryOperatorParselet(Precedence.SUM));
parser.register(TokenType.MULTIPLY, new BinaryOperatorParselet(Precedence.PRODUCT));
parser.register(TokenType.DIVIDE, new BinaryOperatorParselet(Precedence.PRODUCT));

Final Thoughts

Pratt parsing offers a simpler, more extensible, and efficient approach to parsing expressions than traditional recursive descent. By defining grammar rules declaratively and processing expressions dynamically, it enables clearer and more maintainable parsers.

This technique is particularly useful in interpreters, compilers, and domain-specific languages, providing a flexible foundation for handling complex expressions with ease.

For developers working on language design, compilers, or even advanced mathematical expression evaluators, Pratt parsers are a must-know technique that significantly simplifies expression parsing.

Source: journal.stuffwithstuff.com

Scroll to Top