As described in the previous page, backtracking occurs when the parse attempts
to match the current parser P
,
matches part of the input, but fails to match all of P
.
The part of the input consumed during the parse of P
is essentially "given back".
This is necessary because P
may consist of subparsers, and each subparser that succeeds will try to consume
input, produce attributes, etc. When a later subparser fails, the parse of
P
fails, and the input must
be rewound to where it was when P
started its parse, not where the latest matching subparser stopped.
Alternative parsers will often evaluate multiple subparsers one at a time, advancing and then restoring the input position, until one of the subparsers succeeds. Consider this example.
namespace bp = boost::parser; auto const parser = repeat(53)[other_parser] | repeat(10)[other_parser];
Evaluating parser
means trying
to match other_parser
53
times, and if that fails, trying to match other_parser
10 times. Say you parse input that matches other_parser
11 times. parser
will match
it. It will also evaluate other_parser
21 times during the parse.
The attributes of the repeat(53)[other_parser]
and repeat(10)[other_parser]
are each std::vector<
; let's say that ATTR
(other_parser)>
is ATTR
(other_parser)int
.
The attribute of parser
as
a whole is the same, std::vector<int>
.
Since other_parser
is busy
producing int
s — 21 of
them to be exact — you may be wondering what happens to the ones produced
during the evaluation of repeat(53)[other_parser]
when it fails to find all 53 inputs. Its std::vector<int>
will contain 11 int
s at that
point.
When a repeat-parser fails, and attributes are being generated, it clears
its container. This applies to parsers such as the ones above, but also all
the other repeat parsers, including ones made using operator+
or operator*
.
So, at the end of a successful parse by parser
of 10 inputs (since the right side of the alternative only eats 10 repetitions),
the std::vector<int>
attribute
of parser
would contain 10
int
s.
Note | |
---|---|
Users of Boost.Spirit may be familiar with the |
Ok, so if parsers all try their best to match the input, and are all-or-nothing, doesn't that leave room for all kinds of bad input to be ignored? Consider the top-level parser from the Parsing JSON example.
auto const value_p_def = number | bp::bool_ | null | string | array_p | object_p;
What happens if I use this to parse "\""
?
The parse tries number
, fails.
It then tries bp::bool_
, fails. Then null
fails too. Finally, it starts parsing string
.
Good news, the first character is the open-quote of a JSON string. Unfortunately,
that's also the end of the input, so string
must fail too. However, we probably don't want to just give up on parsing
string
now and try array_p
, right? If the user wrote an open-quote
with no matching close-quote, that's not the prefix of some later alternative
of value_p_def
; it's ill-formed
JSON. Here's the parser for the string
rule:
auto const string_def = bp::lexeme['"' >> *(string_char - '"') > '"'];
Notice that operator>
is used on the right instead of operator>>
. This indicates the same sequence
operation as operator>>
,
except that it also represents an expectation. If the parse before the operator>
succeeds, whatever comes after it must also
succeed. Otherwise, the top-level parse is failed, and a diagnostic is emitted.
It will say something like "Expected '"' here.", quoting the
line, with a caret pointing to the place in the input where it expected the
right-side match.
Choosing to use >
versus
>>
is how you indicate
to Boost.Parser that parse failure is or is not a hard error, respectively.