PrevUpHomeNext

Cheat Sheet

Here are all the tables containing the various Boost.Parser parsers, examples, etc., all in one place. These are repeated elsewhere in different sections of the tutorial.

The parsers

This table lists all the Boost.Parser parsers. For the callable parsers, a separate entry exists for each possible arity of arguments. For a parser p, if there is no entry for p without arguments, p is a function, and cannot itself be used as a parser; it must be called. In the table below:

[Note] Note

The definition of parsable_range is:

[parsable_range_concept

]

[Note] Note

Some of the parsers in this table consume no input. All parsers consume the input they match unless otherwise stated in the table below.

Table 1.1. Parsers and Their Semantics

Parser

Semantics

Attribute Type

Notes

eps

Matches epsilon, the empty string. Always matches, and consumes no input.

None.

Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters *eps, +eps, etc (this applies to unconditional eps only).

eps(pred)

Fails to match the input if RESOLVE(pred) == false. Otherwise, the semantics are those of eps.

None.

ws

Matches a single whitespace code point (see note), according to the Unicode White_Space property.

None.

For more info, see the Unicode properties. ws may consume one code point or two. It only consumes two code points when it matches "\r\n".

eol

Matches a single newline (see note), following the "hard" line breaks in the Unicode line breaking algorithm.

None.

For more info, see the Unicode Line Breaking Algorithm. eol may consume one code point or two. It only consumes two code points when it matches "\r\n".

eoi

Matches only at the end of input, and consumes no input.

None.

attr(arg0)

Always matches, and consumes no input. Generates the attribute RESOLVE(arg0).

decltype(RESOLVE(arg0)).

An important use case for attribute is to provide a default attribute value as a trailing alternative. For instance, an optional comma-delmited list is: int_ % ',' | attr(std::vector<int>). Without the "| attr(...)", at least one int_ match would be required.

char_

Matches any single code point.

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.

char_(arg0)

Matches exactly the code point RESOLVE(arg0).

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.

char_(arg0, arg1)

Matches the next code point n in the input, if RESOLVE(arg0) <= n && n <= RESOLVE(arg1).

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.

char_(r)

Matches the next code point n in the input, if n is one of the code points in r.

The code point type in Unicode parsing, or char in non-Unicode parsing. See Attribute Generation.

r is taken to be in a UTF encoding. The exact UTF used depends on r's element type. If you do not pass UTF encoded ranges for r, the behavior of char_ is undefined. Note that ASCII is a subset of UTF-8, so ASCII is fine. EBCDIC is not. r is not copied; a reference to it is taken. The lifetime of char_(r) must be within the lifetime of r. This overload of char_ does not take parse arguments.

cp

Matches a single code point.

char32_t

Similar to char_, but with a fixed char32_t attribute type; cp has all the same call operator overloads as char_, though they are not repeated here, for brevity.

cu

Matches a single code point.

char

Similar to char_, but with a fixed char attribute type; cu has all the same call operator overloads as char_, though they are not repeated here, for brevity. Even though the name "cu" suggests that this parser match at the code unit level, it does not. The name refers to the attribute type generated, much like the names int_ versus uint_.

blank

Equivalent to ws - eol.

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.

control

Matches a single control-character code point.

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.

digit

Matches a single decimal digit code point.

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.

punct

Matches a single punctuation code point.

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.

hex_digit

Matches a single hexidecimal digit code point.

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.

lower

Matches a single lower-case code point.

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.

upper

Matches a single upper-case code point.

The code point type in Unicode parsing, or char in non-Unicode parsing. See the entry for char_.

lit(c)

Matches exactly the given code point c.

None.

lit() does not take parse arguments.

c_l

Matches exactly the given code point c.

None.

This is a UDL that represents lit(c), for example 'F'_l.

lit(r)

Matches exactly the given string r.

None.

lit() does not take parse arguments.

str_l

Matches exactly the given string str.

None.

This is a UDL that represents lit(s), for example "a string"_l.

string(r)

Matches exactly r, and generates the match as an attribute.

std::string

string() does not take parse arguments.

str_p

Matches exactly str, and generates the match as an attribute.

std::string

This is a UDL that represents string(s), for example "a string"_p.

bool_

Matches "true" or "false".

bool

bin

Matches a binary unsigned integral value.

unsigned int

For example, bin would match "101", and generate an attribute of 5u.

bin(arg0)

Matches exactly the binary unsigned integral value RESOLVE(arg0).

unsigned int

oct

Matches an octal unsigned integral value.

unsigned int

For example, oct would match "31", and generate an attribute of 25u.

oct(arg0)

Matches exactly the octal unsigned integral value RESOLVE(arg0).

unsigned int

hex

Matches a hexadecimal unsigned integral value.

unsigned int

For example, hex would match "ff", and generate an attribute of 255u.

hex(arg0)

Matches exactly the hexadecimal unsigned integral value RESOLVE(arg0).

unsigned int

ushort_

Matches an unsigned integral value.

unsigned short

ushort_(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).

unsigned short

uint_

Matches an unsigned integral value.

unsigned int

uint_(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).

unsigned int

ulong_

Matches an unsigned integral value.

unsigned long

ulong_(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).

unsigned long

ulong_long

Matches an unsigned integral value.

unsigned long long

ulong_long(arg0)

Matches exactly the unsigned integral value RESOLVE(arg0).

unsigned long long

short_

Matches a signed integral value.

short

short_(arg0)

Matches exactly the signed integral value RESOLVE(arg0).

short

int_

Matches a signed integral value.

int

int_(arg0)

Matches exactly the signed integral value RESOLVE(arg0).

int

long_

Matches a signed integral value.

long

long_(arg0)

Matches exactly the signed integral value RESOLVE(arg0).

long

long_long

Matches a signed integral value.

long long

long_long(arg0)

Matches exactly the signed integral value RESOLVE(arg0).

long long

float_

Matches a floating-point number. float_ uses parsing implementation details from Boost.Spirit. The specifics of what formats are accepted can be found in their real number parsers. Note that only the default RealPolicies is supported by float_.

float

double_

Matches a floating-point number. double_ uses parsing implementation details from Boost.Spirit. The specifics of what formats are accepted can be found in their real number parsers. Note that only the default RealPolicies is supported by double_.

double

repeat(arg0)[p]

Matches iff p matches exactly RESOLVE(arg0) times.

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

The special value Inf may be used; it indicates unlimited repetition. decltype(RESOLVE(arg0)) must be implicitly convertible to int64_t. Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters repeat(Inf)[eps] (this applies to unconditional eps only).

repeat(arg0, arg1)[p]

Matches iff p matches between RESOLVE(arg0) and RESOLVE(arg1) times, inclusively.

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

The special value Inf may be used for the upper bound; it indicates unlimited repetition. decltype(RESOLVE(arg0)) and decltype(RESOLVE(arg1)) each must be implicitly convertible to int64_t. Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters repeat(n, Inf)[eps] (this applies to unconditional eps only).

if_(pred)[p]

Equivalent to eps(pred) >> p.

std::optional<ATTR(p)>

It is an error to write if_(pred). That is, it is an error to omit the conditionally matched parser p.

switch_(arg0)(arg1, p1)(arg2, p2) ...

Equivalent to p1 when RESOLVE(arg0) == RESOLVE(arg1), p2 when RESOLVE(arg0) == RESOLVE(arg2), etc. If there is such no argN, the behavior of switch_() is undefined.

std::variant<ATTR(p1), ATTR(p2), ...>

It is an error to write switch_(arg0). That is, it is an error to omit the conditionally matched parsers p1, p2, ....

symbols<T>

symbols is an associative container of key, value pairs. Each key is a std::string and each value has type T. In the Unicode parsing path, the strings are considered to be UTF-8 encoded; in the non-Unicode path, no encoding is assumed. symbols Matches the longest prefix pre of the input that is equal to one of the keys k. If the length len of pre is zero, and there is no zero-length key, it does not match the input. If len is positive, the generated attribute is the value associated with k.

T

Unlike the other entries in this table, symbols is a type, not an object.

quoted_string

Matches '"', followed by zero or more characters, followed by '"'.

std::string

The result does not include the quotes. A quote within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].

quoted_string(c)

Matches c, followed by zero or more characters, followed by c.

std::string

The result does not include the c quotes. A c within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].

quoted_string(r)

Matches some character Q in r, followed by zero or more characters, followed by Q.

std::string

The result does not include the Q quotes. A Q within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].

quoted_string(c, symbols)

Matches c, followed by zero or more characters, followed by c.

std::string

The result does not include the c quotes. A c within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. A backslash followed by a successful match using symbols will be interpreted as the corresponding value produced by symbols. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].

quoted_string(r, symbols)

Matches some character Q in r, followed by zero or more characters, followed by Q.

std::string

The result does not include the Q quotes. A Q within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. A backslash followed by a successful match using symbols will be interpreted as the corresponding value produced by symbols. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using lexeme[].


[Important] Important

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).

Operators defined on parsers

Here are all the operator overloaded for parsers. In the tables below:

[Note] Note

Some of the expressions in this table consume no input. All parsers consume the input they match unless otherwise stated in the table below.

Table 1.2. Combining Operations and Their Semantics

Expression

Semantics

Attribute Type

Notes

!p

Matches iff p does not match; consumes no input.

None.

&p

Matches iff p matches; consumes no input.

None.

*p

Parses using p repeatedly until p no longer matches; always matches.

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters *eps (this applies to unconditional eps only).

+p

Parses using p repeatedly until p no longer matches; matches iff p matches at least once.

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

Matching eps an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters +eps (this applies to unconditional eps only).

-p

Equivalent to p | eps.

std::optional<ATTR(p)>

p1 >> p2

Matches iff p1 matches and then p2 matches.

boost::parser::tuple<ATTR(p1), ATTR(p2)> (See note.)

>> is associative; p1 >> p2 >> p3, (p1 >> p2) >> p3, and p1 >> (p2 >> p3) are all equivalent. This attribute type only applies to the case where p1 and p2 both generate attributes; see Attribute Generation for the full rules.

p >> c

Equivalent to p >> lit(c).

ATTR(p)

p >> r

Equivalent to p >> lit(r).

ATTR(p)

p1 > p2

Matches iff p1 matches and then p2 matches. No back-tracking is allowed after p1 matches; if p1 matches but then p2 does not, the top-level parse fails.

boost::parser::tuple<ATTR(p1), ATTR(p2)> (See note.)

> is associative; p1 > p2 > p3, (p1 > p2) > p3, and p1 > (p2 > p3) are all equivalent. This attribute type only applies to the case where p1 and p2 both generate attributes; see Attribute Generation for the full rules.

p > c

Equivalent to p > lit(c).

ATTR(p)

p > r

Equivalent to p > lit(r).

ATTR(p)

p1 | p2

Matches iff either p1 matches or p2 matches.

std::variant<ATTR(p1), ATTR(p2)> (See note.)

| is associative; p1 | p2 | p3, (p1 | p2) | p3, and p1 | (p2 | p3) are all equivalent. This attribute type only applies to the case where p1 and p2 both generate attributes, and where the attribute types are different; see Attribute Generation for the full rules.

p | c

Equivalent to p | lit(c).

ATTR(p)

p | r

Equivalent to p | lit(r).

ATTR(p)

p1 || p2

Matches iff p1 matches and p2 matches, regardless of the order they match in.

boost::parser::tuple<ATTR(p1), ATTR(p2)>

|| is associative; p1 || p2 || p3, (p1 || p2) || p3, and p1 || (p2 || p3) are all equivalent. It is an error to include a eps (conditional or non-conditional) in an operator|| expression. Though the parsers are matched in any order, the attribute elements are always in the order written in the operator|| expression.

p1 - p2

Equivalent to !p2 >> p1.

ATTR(p1)

p - c

Equivalent to p - lit(c).

ATTR(p)

p - r

Equivalent to p - lit(r).

ATTR(p)

p1 % p2

Equivalent to p1 >> *(p2 >> p1).

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p1)>

p % c

Equivalent to p % lit(c).

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

p % r

Equivalent to p % lit(r).

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

p[a]

Matches iff p matches. If p matches, the semantic action a is executed.

None.


[Important] Important

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).

There are a couple of special rules not captured in the table above:

First, the zero-or-more and one-or-more repetitions (operator*() and operator+(), respectively) may collapse when combined. For any parser p, +(+p) collapses to +p; **p, *+p, and +*p each collapse to just *p.

Second, using eps in an alternative parser as any alternative except the last one is a common source of errors; Boost.Parser disallows it. This is true because, for any parser p, eps | p is equivalent to eps, since eps always matches. This is not true for eps parameterized with a condition. For any condition cond, eps(cond) is allowed to appear anywhere within an alternative parser.

Attribute generation for certain parsers

This table summarizes the attributes generated for all Boost.Parser parsers. In the table below:

Table 1.3. Parsers and Their Attributes

Parser

Attribute Type

Notes

eps

None.

eol

None.

eoi

None.

attr(x)

decltype(RESOLVE(x))

char_

The code point type in Unicode parsing, or char in non-Unicode parsing; see below.

Includes all the _p UDLs that take a single character, and all character class parsers like control and lower.

cp

char32_t

cu

char

lit(x)

None.

Includes all the _l UDLs.

string(x)

std::string

Includes all the _p UDLs that take a string.

bool_

bool

bin

unsigned int

oct

unsigned int

hex

unsigned int

ushort_

unsigned short

uint_

unsigned int

ulong_

unsigned long

ulong_long

unsigned long long

short_

short

int_

int

long_

long

long_long

long long

float_

float

double_

double

symbols<T>

T


char_ is a bit odd, since its attribute type is polymorphic. When you use char_ to parse text in the non-Unicode code path (i.e. a string of char), the attribute is char. When you use the exact same char_ to parse in the Unicode-aware code path, all matching is code point based, and so the attribute type is the type used to represent code points, char32_t. All parsing of UTF-8 falls under this case.

Here, we're parsing plain chars, meaning that the parsing is in the non-Unicode code path, the attribute of char_ is char:

auto result = parse("some text", boost::parser::char_);
static_assert(std::is_same_v<decltype(result), std::optional<char>>));

When you parse UTF-8, the matching is done on a code point basis, so the attribute type is char32_t:

auto result = parse("some text" | boost::parser::as_utf8, boost::parser::char_);
static_assert(std::is_same_v<decltype(result), std::optional<char32_t>>));

The good news is that usually you don't parse characters individually. When you parse with char_, you usually parse repetition of then, which will produce a std::string, regardless of whether you're in Unicode parsing mode or not. If you do need to parse individual characters, and want to lock down their attribute type, you can use cp and/or cu to enforce a non-polymorphic attribute type.

Attributes for operations on parsers

Combining operations of course affect the generation of attributes. In the tables below:

Table 1.4. Combining Operations and Their Attributes

Parser

Attribute Type

!p

None.

&p

None.

*p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

+p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

+*p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

*+p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

-p

std::optional<ATTR(p)>

p1 >> p2

boost::parser::tuple<ATTR(p1), ATTR(p2)>

p1 > p2

boost::parser::tuple<ATTR(p1), ATTR(p2)>

p1 >> p2 >> p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 > p2 >> p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 >> p2 > p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 > p2 > p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 | p2

std::variant<ATTR(p1), ATTR(p2)>

p1 | p2 | p3

std::variant<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 || p2

boost::parser::tuple<ATTR(p1), ATTR(p2)>

p1 || p2 || p3

boost::parser::tuple<ATTR(p1), ATTR(p2), ATTR(p3)>

p1 % p2

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p1)>

p[a]

None.

repeat(arg0)[p]

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

repeat(arg0, arg1)[p]

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

if_(pred)[p]

std::optional<ATTR(p)>

switch_(arg0)(arg1, p1)(arg2, p2)...

std::variant<ATTR(p1), ATTR(p2), ...>


[Important] Important

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).

[Important] Important

In case you did not notice it above, adding a semantic action to a parser erases the parser's attribute. The attribute is still available inside the semantic action as _attr(ctx).

More attributes for operations on parsers

In the table: a is a semantic action; and p, p1, p2, ... are parsers that generate attributes. Note that only >> is used here; > has the exact same attribute generation rules.

Table 1.5. Sequence and Alternative Combining Operations and Their Attributes

Expression

Attribute Type

eps >> eps

None.

p >> eps

ATTR(p)

eps >> p

ATTR(p)

cu >> string("str")

std::string

string("str") >> cu

std::string

*cu >> string("str")

boost::parser::tuple<std::string, std::string>

string("str") >> *cu

boost::parser::tuple<std::string, std::string>

p >> p

boost::parser::tuple<ATTR(p), ATTR(p)>

*p >> p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

p >> *p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

*p >> -p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

-p >> *p

std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>

string("str") >> -cu

std::string

-cu >> string("str")

std::string

!p1 | p2[a]

None.

p | p

ATTR(p)

p1 | p2

std::variant<ATTR(p1), ATTR(p2)>

p | eps

std::optional<ATTR(p)>

p1 | p2 | eps

std::optional<std::variant<ATTR(p1), ATTR(p2)>>

p1 | p2[a] | p3

std::optional<std::variant<ATTR(p1), ATTR(p3)>>



PrevUpHomeNext