PrevUpHomeNext

Backtracking

As described in the previous page, backtracking occurs when the parse attempts to match the current parser P, matches part of the input, but fails to match all of P. The part of the input consumed during the parse of P is essentially "given back".

This is necessary because P may consist of subparsers, and each subparser that succeeds will try to consume input, produce attributes, etc. When a later subparser fails, the parse of P fails, and the input must be rewound to where it was when P started its parse, not where the latest matching subparser stopped.

Alternative parsers will often evaluate multiple subparsers one at a time, advancing and then restoring the input position, until one of the subparsers succeeds. Consider this example.

namespace bp = boost::parser;
auto const parser = repeat(53)[other_parser] | repeat(10)[other_parser];

Evaluating parser means trying to match other_parser 53 times, and if that fails, trying to match other_parser 10 times. Say you parse input that matches other_parser 11 times. parser will match it. It will also evaluate other_parser 21 times during the parse.

The attributes of the repeat(53)[other_parser] and repeat(10)[other_parser] are each std::vector<ATTR(other_parser)>; let's say that ATTR(other_parser) is int. The attribute of parser as a whole is the same, std::vector<int>. Since other_parser is busy producing ints — 21 of them to be exact — you may be wondering what happens to the ones produced during the evaluation of repeat(53)[other_parser] when it fails to find all 53 inputs. Its std::vector<int> will contain 11 ints at that point.

When a repeat-parser fails, and attributes are being generated, it clears its container. This applies to parsers such as the ones above, but also all the other repeat parsers, including ones made using operator+ or operator*.

So, at the end of a successful parse by parser of 10 inputs (since the right side of the alternative only eats 10 repetitions), the std::vector<int> attribute of parser would contain 10 ints.

[Note] Note

Users of Boost.Spirit may be familiar with the hold[] directive. Because of the behavior described above, there is no such directive in Boost.Parser.

Expectation points

Ok, so if parsers all try their best to match the input, and are all-or-nothing, doesn't that leave room for all kinds of bad input to be ignored? Consider the top-level parser from the Parsing JSON example.

auto const value_p_def =
    number | bp::bool_ | null | string | array_p | object_p;

What happens if I use this to parse "\""? The parse tries number, fails. It then tries bp::bool_, fails. Then null fails too. Finally, it starts parsing string. Good news, the first character is the open-quote of a JSON string. Unfortunately, that's also the end of the input, so string must fail too. However, we probably don't want to just give up on parsing string now and try array_p, right? If the user wrote an open-quote with no matching close-quote, that's not the prefix of some later alternative of value_p_def; it's ill-formed JSON. Here's the parser for the string rule:

auto const string_def = bp::lexeme['"' >> *(string_char - '"') > '"'];

Notice that operator> is used on the right instead of operator>>. This indicates the same sequence operation as operator>>, except that it also represents an expectation. If the parse before the operator> succeeds, whatever comes after it must also succeed. Otherwise, the top-level parse is failed, and a diagnostic is emitted. It will say something like "Expected '"' here.", quoting the line, with a caret pointing to the place in the input where it expected the right-side match.

Choosing to use > versus >> is how you indicate to Boost.Parser that parse failure is or is not a hard error, respectively.


PrevUpHomeNext