Now that you've seen some examples, let's see how parsing works in a bit more detail. Consider this example.
namespace bp = boost::parser; auto int_pair = bp::int_ >> bp::int_; // Attribute: tuple<int, int> auto int_pairs_plus = +int_pair >> bp::int_; // Attribute: tuple<std::vector<tuple<int, int>>, int>
int_pairs_plus
must match
a pair of int
s (using int_pair
) one or more times, and then must
match an additional int
. In
other words, it matches any odd number (greater than 1) of int
s in the input. Let's look at how this
parse proceeds.
auto result = bp::parse("1 2 3", int_pairs_plus, bp::ws);
At the beginning of the parse, the top level parser uses its first subparser
(if any) to start parsing. So, int_pairs_plus
,
being a sequence parser, would pass control to its first parser +int_pair
.
Then +int_pair
would use int_pair
to do
its parsing, which would in turn use bp::int_
.
This creates a stack of parsers, each one using a particular subparser.
Step 1) The input is "1 2 3"
,
and the stack of active parsers is int_pairs_plus
-> +int_pair
-> int_pair
-> bp::int_
.
(Read "->" as "uses".) This parses "1"
,
and the whitespace after is skipped by bp::ws
. Control
passes to the second bp::int_
parser in int_pair
.
Step 2) The input is "2 3"
and the stack of parsers looks the same, except the active parser is the
second bp::int_
from int_pair
.
This parser consumes "2"
and then bp::ws
skips the subsequent space. Since we've
finished with int_pair
's
match, its boost::parser::tuple<int,
int>
attribute is complete. It's parent is +int_pair
, so this tuple attribute is pushed
onto the back of +int_pair
's
attribute, which is a std::vector<boost::parser::tuple<int, int>>
. Control passes up to the parent
of int_pair
, +int_pair
.
Since +int_pair
is a one-or-more parser, it starts a new iteration; control passes to int_pair
again.
Step 3) The input is "3"
and the stack of parsers looks the same, except the active parser is the
first bp::int_
from int_pair
again, and we're in the second iteration of +int_pair
. This parser consumes "3"
. Since this is the end of the
input, the second bp::int_
of int_pair
does not match. This partial match of "3"
should not count, since it was not part of a full match. So, int_pair
indicates its failure, and +int_pair
stops iterating. Since it did match once, +int_pair
does not fail; it is a zero-or-more
parser; failure of its subparser after the first success does not cause it
to fail. Control passes to the next parser in sequence within int_pairs_plus
.
Step 4) The input is "3"
again, and the stack of parsers is int_pairs_plus
-> bp::int_
. This parses the "3"
,
and the parse reaches the end of input. Control passes to int_pairs_plus
,
which has just successfully matched with all parser in its sequence. It then
produces its attribute, a boost::parser::tuple<std::vector<boost::parser::tuple<int, int>>, int>
, which gets returned from bp::parse()
.
Something to take note of between Steps #3 and #4: at the beginning of #4, the input position had returned to where is was at the beginning of #3. This kind of backtracking happens in alternative parsers when an alternative fails. The next page has more details on the semantics of backtracking.
So far, parsers have been presented as somewhat abstract entities. You may
be wanting more detail. A Boost.Parser parser P
is an invocable object with a pair of call operator overloads. The two functions
are very similar, and in many parsers one is implemented in terms of the
other. The first function does the parsing and returns the default attribute
for the parser. The second function does exactly the same parsing, but takes
an out-param into which it writes the attribute for the parser. The out-param
does not need to be the same type as the default attribute, but they need
to be compatible.
Compatibility means that the default attribute is assignable to the out-param
in some fashion. This usually means direct assignment, but it may also mean
a tuple -> aggregate or aggregate -> tuple conversion. For sequence
types, compatibility means that the sequence type has insert
or push_back
with the usual
semantics. This means that the parser +boost::parser::int_
can fill a std::set<int>
just
as well as a std::vector<int>
.
Some parsers also have additional state that is required to perform a match.
For instance, char_
parsers
can be parameterized with a single code point to match; the exact value of
that code point is stored in the parser object.
No parser has direct support for all the operations defined on parsers (operator|
,
operator>>
,
etc.). Instead, there is a template called parser_interface
that supports
all of these operations. parser_interface
wraps each
parser, storing it as a data member, adapting it for general use. You should
only ever see parser_interface
in the debugger,
or possibly in some of the reference documentation. You should never have
to write it in your own code.