Boost.Parser comes with all the parsers most parsing tasks will ever need.
Each one is a constexpr
object,
or a constexpr
function. Some
of the non-functions are also callable, such as char_
, which may be used directly,
or with arguments, as in char_
('a', 'z')
. Any parser that can be called, whether
a function or callable object, will be called a callable parser
from now on. Note that there are no nullary callable parsers; they each take
one or more arguments.
Each callable parser takes one or more parse arguments. A parse argument may be a value or an invocable object that accepts a reference to the parse context. The reference parameter may be mutable or constant. For example:
struct get_attribute { template<typename Context> auto operator()(Context & ctx) { return _attr(ctx); } };
This can also be a lambda. For example:
[](auto const & ctx) { return _attr(ctx); }
The operation that produces a value from a parse argument, which may be a
value or a callable taking a parse context argument, is referred to as resolving
the parse argument. If a parse argument arg
can be called with the current context, then the resolved value of arg
is arg(ctx)
;
otherwise, the resolved value is just arg
.
Some callable parsers take a parse predicate. A parse
predicate is not quite the same as a parse argument, because it must be a
callable object, and cannot be a value. A parse predicate's return type must
be contextually convertible to bool
.
For example:
struct equals_three { template<typename Context> bool operator()(Context const & ctx) { return _attr(ctx) == 3; } };
This may of course be a lambda:
[](auto & ctx) { return _attr(ctx) == 3; }
The notional macro RESOLVE
()
expands to the result of resolving a parse
argument or parse predicate. You'll see it used in the rest of the documentation.
An example of how parse arguments are used:
namespace bp = boost::parser; // This parser matches one code point that is at least 'a', and at most // the value of last_char, which comes from the globals. auto last_char = [](auto & ctx) { return _globals(ctx).last_char; } auto subparser = bp::char_('a', last_char);
Don't worry for now about what the globals are for now; the take-away is that you can make any argument you pass to a parser depend on the current state of the parse, by using the parse context:
namespace bp = boost::parser; // This parser parses two code points. For the parse to succeed, the // second one must be >= 'a' and <= the first one. auto set_last_char = [](auto & ctx) { _globals(ctx).last_char = _attr(x); }; auto parser = bp::char_[set_last_char] >> subparser;
Each callable parser returns a new parser, parameterized using the arguments given in the invocation.
This table lists all the Boost.Parser parsers. For the callable parsers,
a separate entry exists for each possible arity of arguments. For a parser
p
, if there is no entry for
p
without arguments, p
is a function, and cannot itself be used
as a parser; it must be called. In the table below:
char
");
RESOLVE
()
is a notional macro that expands to the resolution of parse argument
or evaluation of a parse predicate (see The
Parsers And Their Uses);
RESOLVE
(pred) == true
"
is a shorthand notation for "RESOLVE
(pred)
is contextually convertible to bool
and true
";
likewise for false
;
c
is a character of type
char
, char8_t
,
or char32_t
;
str
is a string literal
of type char const[]
, char8_t
const []
,
or char32_t const
[]
;
pred
is a parse predicate;
arg0
, arg1
,
arg2
, ... are parse arguments;
a
is a semantic action;
r
is an object whose
type models parsable_range
;
and
p
, p1
,
p2
, ... are parsers.
escapes
is a symbols<T>
object, where T
is char
or char32_t
.
Note | |
---|---|
The definition of [parsable_range_concept |
]
Note | |
---|---|
Some of the parsers in this table consume no input. All parsers consume the input they match unless otherwise stated in the table below. |
Table 1.6. Parsers and Their Semantics
Parser |
Semantics |
Attribute Type |
Notes |
---|---|---|---|
Matches epsilon, the empty string. Always matches, and consumes no input. |
None. |
Matching |
|
|
Fails to match the input if |
None. |
|
Matches a single whitespace code point (see note), according to the Unicode White_Space property. |
None. |
For more info, see the Unicode
properties. |
|
Matches a single newline (see note), following the "hard" line breaks in the Unicode line breaking algorithm. |
None. |
For more info, see the Unicode
Line Breaking Algorithm. |
|
Matches only at the end of input, and consumes no input. |
None. |
||
|
Always matches, and consumes no input. Generates the attribute
|
|
An important use case for |
Matches any single code point. |
The code point type in Unicode parsing, or |
||
|
Matches exactly the code point |
The code point type in Unicode parsing, or |
|
|
Matches the next code point |
The code point type in Unicode parsing, or |
|
|
Matches the next code point |
The code point type in Unicode parsing, or |
|
Matches a single code point. |
|
Similar to |
|
Matches a single code point. |
|
Similar to |
|
The code point type in Unicode parsing, or |
|||
Matches a single control-character code point. |
The code point type in Unicode parsing, or |
||
Matches a single decimal digit code point. |
The code point type in Unicode parsing, or |
||
Matches a single punctuation code point. |
The code point type in Unicode parsing, or |
||
Matches a single hexidecimal digit code point. |
The code point type in Unicode parsing, or |
||
Matches a single lower-case code point. |
The code point type in Unicode parsing, or |
||
Matches a single upper-case code point. |
The code point type in Unicode parsing, or |
||
|
Matches exactly the given code point |
None. |
|
|
Matches exactly the given code point |
None. |
|
|
Matches exactly the given string |
None. |
|
|
Matches exactly the given string |
None. |
This is a UDL
that represents |
|
Matches exactly |
|
|
|
Matches exactly |
|
This is a UDL
that represents |
Matches |
|
||
Matches a binary unsigned integral value. |
|
For example, |
|
|
Matches exactly the binary unsigned integral value |
|
|
Matches an octal unsigned integral value. |
|
For example, |
|
|
Matches exactly the octal unsigned integral value |
|
|
Matches a hexadecimal unsigned integral value. |
|
For example, |
|
|
Matches exactly the hexadecimal unsigned integral value |
|
|
Matches an unsigned integral value. |
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches an unsigned integral value. |
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches an unsigned integral value. |
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches an unsigned integral value. |
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches a signed integral value. |
|
||
|
Matches exactly the signed integral value |
|
|
Matches a signed integral value. |
|
||
|
Matches exactly the signed integral value |
|
|
Matches a signed integral value. |
|
||
|
Matches exactly the signed integral value |
|
|
Matches a signed integral value. |
|
||
|
Matches exactly the signed integral value |
|
|
Matches a floating-point number. |
|
||
Matches a floating-point number. |
|
||
|
Matches iff |
|
The special value |
|
Matches iff |
|
The special value |
|
Equivalent to |
|
It is an error to write |
|
Equivalent to |
|
It is an error to write |
|
|
Unlike the other entries in this table, |
|
Matches |
|
The result does not include the quotes. A quote within the string
can be written by escaping it with a backslash. A backslash within
the string can be written by writing two consecutive backslashes.
Any other use of a backslash will fail the parse. Skipping is disabled
while parsing the entire string, as if using |
|
Matches |
|
The result does not include the |
|
Matches some character |
|
The result does not include the |
|
|
Matches |
|
The result does not include the |
|
Matches some character |
|
The result does not include the |
Important | |
---|---|
All the character parsers, like |
Note | |
---|---|
A slightly more complete description of the attributes generated by these parsers is in a subsequent section. The attributes are repeated here so you can use see all the properties of the parsers in one place. |
If you have an integral type IntType
that is not covered by any of the Boost.Parser parsers, you can use a more
verbose declaration to declare a parser for IntType
.
If IntType
were unsigned,
you would use uint_parser
.
If it were signed, you would use int_parser
.
For example:
constexpr parser_interface<int_parser<IntType>> hex_int;
uint_parser
and int_parser
accept three more non-type template
parameters after the type parameter. They are Radix
,
MinDigits
, and MaxDigits
. Radix
defaults to 10
, MinDigits
to 1
,
and MaxDigits
to -1
, which is
a sentinel value meaning that there is no max number of digits.
So, if you wanted to parse exactly eight hexadecimal digits in a row in order
to recognize Unicode character literals like C++ has (e.g. \Udeadbeef
),
you could use this parser for the digits at the end:
constexpr parser_interface<uint_parser<unsigned int, 16, 8, 8>> hex_int;