PrevUpHomeNext

Alternative Parsers

Frequently, you need to parse something that might have one of several forms. operator| is overloaded to form alternative parsers. For example:

namespace bp = boost::parser;
auto const parser_1 = bp::int_ | bp::eps;

parser_1 matches an integer, or if that fails, it matches epsilon, the empty string. This is equivalent to writing:

namespace bp = boost::parser;
auto const parser_2 = -bp::int_;

However, neither parser_1 nor parser_2 is equivalent to writing this:

namespace bp = boost::parser;
auto const parser_3 = bp::eps | bp::int_; // Does not do what you think.

The reason is that alternative parsers try each of their subparsers, one at a time, and stop on the first one that matches. Epsilon matches anything, since it is zero length and consumes no input. It even matches the end of input. This means that parser_3 is equivalent to eps by itself.

[Note] Note

For this reason, writing eps | p for any parser p is considered a bug. Debug builds will assert when eps | p is encountered.

[Warning] Warning

This kind of error is very common when eps is involved, and also very easy to detect. However, it is possible to write P1 >> P2, where P1 is a prefix of P2, such as int_ | int >> int_, or repeat(4)[hex_digit] | repeat(8)[hex_digit]. This is almost certainly an error, but is impossible to detect in the general case — remember that rules can be separately compiled, and consider a pair of rules whose associated _def parsers are int_ and int_ >> int_, respectively.


PrevUpHomeNext