The Parsers And Their Uses

Boost.Parser comes with all the parsers most parsing tasks will ever need. Each one is a constexpr object, or a constexpr function. Some of the non-functions are also callable, such as char_, which may be used directly, or with arguments, as in char_('a', 'z'). Any parser that can be called, whether a function or callable object, will be called a callable parser from now on. Note that there are no nullary callable parsers; they each take one or more arguments.

Each callable parser takes one or more parse arguments. A parse argument may be a value or an invocable object that accepts a reference to the parse context. The reference parameter may be mutable or constant. For example:

struct get_attribute
{
    template<typename Context>
    auto operator()(Context & ctx)
    {
        return _attr(ctx);
    }
};

This can also be a lambda. For example:

[](auto const & ctx) { return _attr(ctx); }

The operation that produces a value from a parse argument, which may be a value or a callable taking a parse context argument, is referred to as resolving the parse argument. If a parse argument arg can be called with the current context, then the resolved value of arg is arg(ctx); otherwise, the resolved value is just arg.

Some callable parsers take a parse predicate. A parse predicate is not quite the same as a parse argument, because it must be a callable object, and cannot be a value. A parse predicate's return type must be contextually convertible to bool. For example:

struct equals_three
{
    template<typename Context>
    bool operator()(Context const & ctx)
    {
        return _attr(ctx) == 3;
    }
};

This may of course be a lambda:

[](auto & ctx) { return _attr(ctx) == 3; }

The notional macro RESOLVE() expands to the result of resolving a parse argument or parse predicate. You'll see it used in the rest of the documentation.

An example of how parse arguments are used:

namespace bp = boost::parser;
// This parser matches one code point that is at least 'a', and at most
// the value of last_char, which comes from the globals.
auto last_char = [](auto & ctx) { return _globals(ctx).last_char; }
auto subparser = bp::char_('a', last_char);

Don't worry for now about what the globals are for now; the take-away is that you can make any argument you pass to a parser depend on the current state of the parse, by using the parse context:

namespace bp = boost::parser;
// This parser parses two code points.  For the parse to succeed, the
// second one must be >= 'a' and <= the first one.
auto set_last_char = [](auto & ctx) { _globals(ctx).last_char = _attr(x); };
auto parser = bp::char_[set_last_char] >> subparser;

Each callable parser returns a new parser, parameterized using the arguments given in the invocation.

This table lists all the Boost.Parser parsers. For the callable parsers, a separate entry exists for each possible arity of arguments. For a parser p, if there is no entry for p without arguments, p is a function, and cannot itself be used as a parser; it must be called. In the table below:

each entry is a global object usable directly in your parsers, unless otherwise noted;
"code point" is used to refer to the elements of the input range, which assumes that the parse is being done in the Unicode-aware code path (if the parse is being done in the non-Unicode code path, read "code point" as "char");
RESOLVE() is a notional macro that expands to the resolution of parse argument or evaluation of a parse predicate (see The Parsers And Their Uses);
"RESOLVE(pred) == true" is a shorthand notation for "RESOLVE(pred) is contextually convertible to bool and true"; likewise for false;
c is a character of type char, char8_t, or char32_t;
str is a string literal of type char const[], char8_t const [], or char32_t const [];
pred is a parse predicate;
arg0, arg1, arg2, ... are parse arguments;
a is a semantic action;
r is an object whose type models parsable_range; and
p, p1, p2, ... are parsers.
escapes is a symbols<T> object, where T is char or char32_t.

Note

The definition of parsable_range is:

[parsable_range_concept

]

	Note
	Some of the parsers in this table consume no input. All parsers consume the input they match unless otherwise stated in the table below.

Table 1.6. Parsers and Their Semantics

Parser	Semantics	Attribute Type	Notes
`eps`	Matches epsilon, the empty string. Always matches, and consumes no input.	None.	Matching `eps` an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters `*eps`, `+eps`, etc (this applies to unconditional `eps` only).
`eps(pred)`	Fails to match the input if `RESOLVE(pred) == false`. Otherwise, the semantics are those of `eps`.	None.
`ws`	Matches a single whitespace code point (see note), according to the Unicode White_Space property.	None.	For more info, see the Unicode properties. `ws` may consume one code point or two. It only consumes two code points when it matches `"\r\n"`.
`eol`	Matches a single newline (see note), following the "hard" line breaks in the Unicode line breaking algorithm.	None.	For more info, see the Unicode Line Breaking Algorithm. `eol` may consume one code point or two. It only consumes two code points when it matches `"\r\n"`.
`eoi`	Matches only at the end of input, and consumes no input.	None.
`attr(arg0)`	Always matches, and consumes no input. Generates the attribute `RESOLVE(arg0)`.	`decltype(RESOLVE(arg0))`.	An important use case for `attribute` is to provide a default attribute value as a trailing alternative. For instance, an optional comma-delmited list is: `int_ % ',' \| attr(std::vector<int>)`. Without the "`\| attr(...)`", at least one `int_` match would be required.
`char_`	Matches any single code point.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See Attribute Generation.
`char_(arg0)`	Matches exactly the code point `RESOLVE(arg0)`.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See Attribute Generation.
`char_(arg0, arg1)`	Matches the next code point `n` in the input, if `RESOLVE(arg0) <= n && n <= RESOLVE(arg1)`.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See Attribute Generation.
`char_(r)`	Matches the next code point `n` in the input, if `n` is one of the code points in `r`.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See Attribute Generation.	`r` is taken to be in a UTF encoding. The exact UTF used depends on `r`'s element type. If you do not pass UTF encoded ranges for `r`, the behavior of `char_` is undefined. Note that ASCII is a subset of UTF-8, so ASCII is fine. EBCDIC is not. `r` is not copied; a reference to it is taken. The lifetime of `char_(r)` must be within the lifetime of `r`. This overload of `char_` does not take parse arguments.
`cp`	Matches a single code point.	`char32_t`	Similar to `char_`, but with a fixed `char32_t` attribute type; `cp` has all the same call operator overloads as `char_`, though they are not repeated here, for brevity.
`cu`	Matches a single code point.	`char`	Similar to `char_`, but with a fixed `char` attribute type; `cu` has all the same call operator overloads as `char_`, though they are not repeated here, for brevity. Even though the name "`cu`" suggests that this parser match at the code unit level, it does not. The name refers to the attribute type generated, much like the names `int_` versus `uint_`.
`blank`	Equivalent to `ws - eol`.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See the entry for `char_`.
`control`	Matches a single control-character code point.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See the entry for `char_`.
`digit`	Matches a single decimal digit code point.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See the entry for `char_`.
`punct`	Matches a single punctuation code point.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See the entry for `char_`.
`hex_digit`	Matches a single hexidecimal digit code point.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See the entry for `char_`.
`lower`	Matches a single lower-case code point.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See the entry for `char_`.
`upper`	Matches a single upper-case code point.	The code point type in Unicode parsing, or `char` in non-Unicode parsing. See the entry for `char_`.
`lit(c)`	Matches exactly the given code point `c`.	None.	`lit()` does not take parse arguments.
`c_l`	Matches exactly the given code point `c`.	None.	This is a UDL that represents `lit(c)`, for example `'F'_l`.
`lit(r)`	Matches exactly the given string `r`.	None.	`lit()` does not take parse arguments.
`str_l`	Matches exactly the given string `str`.	None.	This is a UDL that represents `lit(s)`, for example `"a string"_l`.
`string(r)`	Matches exactly `r`, and generates the match as an attribute.	`std::string`	`string()` does not take parse arguments.
`str_p`	Matches exactly `str`, and generates the match as an attribute.	`std::string`	This is a UDL that represents `string(s)`, for example `"a string"_p`.
`bool_`	Matches `"true"` or `"false"`.	`bool`
`bin`	Matches a binary unsigned integral value.	`unsigned int`	For example, `bin` would match `"101"`, and generate an attribute of `5u`.
`bin(arg0)`	Matches exactly the binary unsigned integral value `RESOLVE(arg0)`.	`unsigned int`
`oct`	Matches an octal unsigned integral value.	`unsigned int`	For example, `oct` would match `"31"`, and generate an attribute of `25u`.
`oct(arg0)`	Matches exactly the octal unsigned integral value `RESOLVE(arg0)`.	`unsigned int`
`hex`	Matches a hexadecimal unsigned integral value.	`unsigned int`	For example, `hex` would match `"ff"`, and generate an attribute of `255u`.
`hex(arg0)`	Matches exactly the hexadecimal unsigned integral value `RESOLVE(arg0)`.	`unsigned int`
`ushort_`	Matches an unsigned integral value.	`unsigned short`
`ushort_(arg0)`	Matches exactly the unsigned integral value `RESOLVE(arg0)`.	`unsigned short`
`uint_`	Matches an unsigned integral value.	`unsigned int`
`uint_(arg0)`	Matches exactly the unsigned integral value `RESOLVE(arg0)`.	`unsigned int`
`ulong_`	Matches an unsigned integral value.	`unsigned long`
`ulong_(arg0)`	Matches exactly the unsigned integral value `RESOLVE(arg0)`.	`unsigned long`
`ulong_long`	Matches an unsigned integral value.	`unsigned long long`
`ulong_long(arg0)`	Matches exactly the unsigned integral value `RESOLVE(arg0)`.	`unsigned long long`
`short_`	Matches a signed integral value.	`short`
`short_(arg0)`	Matches exactly the signed integral value `RESOLVE(arg0)`.	`short`
`int_`	Matches a signed integral value.	`int`
`int_(arg0)`	Matches exactly the signed integral value `RESOLVE(arg0)`.	`int`
`long_`	Matches a signed integral value.	`long`
`long_(arg0)`	Matches exactly the signed integral value `RESOLVE(arg0)`.	`long`
`long_long`	Matches a signed integral value.	`long long`
`long_long(arg0)`	Matches exactly the signed integral value `RESOLVE(arg0)`.	`long long`
`float_`	Matches a floating-point number. `float_` uses parsing implementation details from Boost.Spirit. The specifics of what formats are accepted can be found in their real number parsers. Note that only the default `RealPolicies` is supported by `float_`.	`float`
`double_`	Matches a floating-point number. `double_` uses parsing implementation details from Boost.Spirit. The specifics of what formats are accepted can be found in their real number parsers. Note that only the default `RealPolicies` is supported by `double_`.	`double`
`repeat(arg0)[p]`	Matches iff `p` matches exactly `RESOLVE(arg0)` times.	`std::string` if `ATTR(p)` is `char` or `char32_t`, otherwise `std::vector<ATTR(p)>`	The special value `Inf` may be used; it indicates unlimited repetition. `decltype(RESOLVE(arg0))` must be implicitly convertible to `int64_t`. Matching `eps` an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters `repeat(Inf)[eps]` (this applies to unconditional `eps` only).
`repeat(arg0, arg1)[p]`	Matches iff `p` matches between `RESOLVE(arg0)` and `RESOLVE(arg1)` times, inclusively.	`std::string` if `ATTR(p)` is `char` or `char32_t`, otherwise `std::vector<ATTR(p)>`	The special value `Inf` may be used for the upper bound; it indicates unlimited repetition. `decltype(RESOLVE(arg0))` and `decltype(RESOLVE(arg1))` each must be implicitly convertible to `int64_t`. Matching `eps` an unlimited number of times creates an infinite loop, which is undefined behavior in C++. Boost.Parser will assert in debug mode when it encounters `repeat(n, Inf)[eps]` (this applies to unconditional `eps` only).
`if_(pred)[p]`	Equivalent to `eps(pred) >> p`.	`std::optional<ATTR(p)>`	It is an error to write `if_(pred)`. That is, it is an error to omit the conditionally matched parser `p`.
`switch_(arg0)(arg1, p1)(arg2, p2) ...`	Equivalent to `p1` when `RESOLVE(arg0) == RESOLVE(arg1)`, `p2` when `RESOLVE(arg0) == RESOLVE(arg2)`, etc. If there is such no `argN`, the behavior of `switch_()` is undefined.	`std::variant<ATTR(p1), ATTR(p2), ...>`	It is an error to write `switch_(arg0)`. That is, it is an error to omit the conditionally matched parsers `p1`, `p2`, ....
`symbols<T>`	`symbols` is an associative container of key, value pairs. Each key is a `std::string` and each value has type `T`. In the Unicode parsing path, the strings are considered to be UTF-8 encoded; in the non-Unicode path, no encoding is assumed. `symbols` Matches the longest prefix `pre` of the input that is equal to one of the keys `k`. If the length `len` of `pre` is zero, and there is no zero-length key, it does not match the input. If `len` is positive, the generated attribute is the value associated with `k`.	`T`	Unlike the other entries in this table, `symbols` is a type, not an object.
`quoted_string`	Matches `'"'`, followed by zero or more characters, followed by `'"'`.	`std::string`	The result does not include the quotes. A quote within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using `lexeme[]`.
`quoted_string(c)`	Matches `c`, followed by zero or more characters, followed by `c`.	`std::string`	The result does not include the `c` quotes. A `c` within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using `lexeme[]`.
`quoted_string(r)`	Matches some character `Q` in `r`, followed by zero or more characters, followed by `Q`.	`std::string`	The result does not include the `Q` quotes. A `Q` within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using `lexeme[]`.
`quoted_string(c, symbols)`	Matches `c`, followed by zero or more characters, followed by `c`.	`std::string`	The result does not include the `c` quotes. A `c` within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. A backslash followed by a successful match using `symbols` will be interpreted as the corresponding value produced by `symbols`. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using `lexeme[]`.
`quoted_string(r, symbols)`	Matches some character `Q` in `r`, followed by zero or more characters, followed by `Q`.	`std::string`	The result does not include the `Q` quotes. A `Q` within the string can be written by escaping it with a backslash. A backslash within the string can be written by writing two consecutive backslashes. A backslash followed by a successful match using `symbols` will be interpreted as the corresponding value produced by `symbols`. Any other use of a backslash will fail the parse. Skipping is disabled while parsing the entire string, as if using `lexeme[]`.

Important

	Important
All the character parsers, like `char_`, `cp` and `cu` produce either `char` or `char32_t` attributes. So when you see "`std::string` if `ATTR(p)` is `char` or `char32_t`, otherwise `std::vector<ATTR(p)>`" in the table above, that effectively means that every sequences of character attributes get turned into a `std::string`. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use `attribute` to do so).

All the character parsers, like char_, cp and cu produce either char or char32_t attributes. So when you see "std::string if ATTR(p) is char or char32_t, otherwise std::vector<ATTR(p)>" in the table above, that effectively means that every sequences of character attributes get turned into a std::string. The only time this does not happen is when you introduce your own rules with attributes using another character type (or use attribute to do so).

	Note
	A slightly more complete description of the attributes generated by these parsers is in a subsequent section. The attributes are repeated here so you can use see all the properties of the parsers in one place.

If you have an integral type IntType that is not covered by any of the Boost.Parser parsers, you can use a more verbose declaration to declare a parser for IntType. If IntType were unsigned, you would use uint_parser. If it were signed, you would use int_parser. For example:

constexpr parser_interface<int_parser<IntType>> hex_int;

uint_parser and int_parser accept three more non-type template parameters after the type parameter. They are Radix, MinDigits, and MaxDigits. Radix defaults to 10, MinDigits to 1, and MaxDigits to -1, which is a sentinel value meaning that there is no max number of digits.

So, if you wanted to parse exactly eight hexadecimal digits in a row in order to recognize Unicode character literals like C++ has (e.g. \Udeadbeef), you could use this parser for the digits at the end:

constexpr parser_interface<uint_parser<unsigned int, 16, 8, 8>> hex_int;