PrevUpHomeNext

Directives

A directive is an element of your parser that doesn't have any meaning by itself. Some are second-order parsers that need a first-order parser to do the actual parsing. Others influence the parse in some way. You can often spot a directive lexically by its use of []; directives always []. Non-directives might, but only when attaching a semantic action.

The directives that are second order parsers are technically directives, but since they are also used to create parsers, it is more useful just to focus on that. The directives repeat() and if_() were already described in the section on parsers; we won't say much about them here.

Interaction with sequence, alternative, and permutation parsers

Sequence, alternative, and permutation parsers do not nest in most cases. (Let's consider just sequence parsers to keep thinkgs simple, but most of this logic applies to alternative parsers as well.) a >> b >> c is the same as (a >> b) >> c and a >> (b >> c), and they are each represented by a single seq_parser with three subparsers, a, b, and c. However, if something prevents two seq_parsers from interacting directly, they will nest. For instance, lexeme[a >> b] >> c is a seq_parser containing two parsers, lexeme[a >> b] and c. This is because lexeme[] takes its given parser and wraps it in a lexeme_parser. This in turn turns off the sequence parser combining logic, since both sides of the second operator>> in lexeme[a >> b] >> c are not seq_parsers. Sequence parsers have several rules that govern what the overall attribute type of the parser is, based on the positions and attributes of it subparsers (see Attribute Generation). Therefore, it's important to know which directives create a new parser (and what kind), and which ones do not; this is indicated for each directive below.

The directives
repeat()

See The Parsers And Their Uses. Creates a repeat_parser.

if_()

See The Parsers And Their Uses. Creates a seq_parser.

omit[]

omit[p] disables attribute generation for the parser p. Not only does omit[p] have no attribute, but any attribute generation work that normally happens within p is skipped.

This directive can be useful in cases like this: say you have some fairly complicated parser p that generates a large and expensive-to-construct attribute. Now say that you want to write a function that just counts how many times p can match a string (where the matches are non-overlapping). Instead of using p directly, and building all those attributes, or rewriting p without the attribute generation, use omit[].

Creates an omit_parser.

raw[]

raw[p] changes the attribute from ATTR(p) to to a view that delimits the subrange of the input that was matched by p. The type of the view is subrange<I>, where I is the type of the iterator used within the parse. Note that this may not be the same as the iterator type passed to parse(). For instance, when parsing UTF-8, the iterator passed to parse() may be char8_t const *, but within the parse it will be a UTF-8 to UTF-32 transcoding (converting) iterator. Just like omit[], raw[] causes all attribute-generation work within p to be skipped.

Similar to the re-use scenario for omit[] above, raw[] could be used to find the locations of all non-overlapping matches of p in a string.

Creates a raw_parser.

string_view[]

string_view[p] is very similar to raw[p], except that it changes the attribute of p to std::basic_string_view<C>, where C is the character type of the underlying range being parsed. string_view[] requires that the underlying range being parsed is contiguous. Since this can only be detected in C++20 and later, string_view[] is not available in C++17 mode.

Similar to the re-use scenario for omit[] above, string_view[] could be used to find the locations of all non-overlapping matches of p in a string. Whether raw[] or string_view[] is more natural to use to report the locations depends on your use case, but they are essentially the same.

Creates a string_view_parser.

no_case[]

no_case[p] enables case-insensitive parsing within the parse of p. This applies to the text parsed by char_(), string(), and bool_ parsers. The number parsers are already case-insensitive. The case-insensitivity is achieved by doing Unicode case folding on the text being parsed and the values in the parser being matched (see note below if you want to know more about Unicode case folding). In the non-Unicode code path, a full Unicode case folding is not done; instead, only the transformations of values less than 0x100 are done. Examples:

#include <boost/parser/transcode_view.hpp> // For as_utfN.

namespace bp = boost::parser;
auto const street_parser = bp::string(u8"Tobias Straße");
assert(!bp::parse("Tobias Strasse" | bp::as_utf32, street_parser));             // No match.
assert(bp::parse("Tobias Strasse" | bp::as_utf32, bp::no_case[street_parser])); // Match!

auto const alpha_parser = bp::no_case[bp::char_('a', 'z')];
assert(bp::parse("a" | bp::as_utf32, bp::no_case[alpha_parser])); // Match!
assert(bp::parse("B" | bp::as_utf32, bp::no_case[alpha_parser])); // Match!

Everything pretty much does what you'd naively expect inside no_case[], except that the two-character range version of char_ has a limitation. It only compares a code point from the input to its two arguments (e.g. 'a' and 'z' in the example above). It does not do anything special for multi-code point case folding expansions. For instance, char_(U'ß', U'ß') matches the input U"s", which makes sense, since U'ß' expands to U"ss". However, that same parser does not match the input U"ß"! In short, stick to pairs of code points that have single-code point case folding expansions. If you need to support the multi-expanding code points, use the other overload, like: char_(U"abcd/*...*/ß").

[Note] Note

Unicode case folding is an operation that makes text uniformly one case, and if you do it to two bits of text A and B, then you can compare them bitwise to see if they are the same, except of case. Case folding may sometimes expand a code point into multiple code points (e.g. case folding "ẞ" yields "ss". When such a multi-code point expansion occurs, the expanded code points are in the NFKC normalization form.

Creates a no_case_parser.

lexeme[]

lexeme[p] disables use of the skipper, if a skipper is being used, within the parse of p. This is useful, for instance, if you want to enable skipping in most parts of your parser, but disable it only in one section where it doesn't belong. If you are skipping whitespace in most of your parser, but want to parse strings that may contain spaces, you should use lexeme[]:

namespace bp = boost::parser;
auto const string_parser = bp::lexeme['"' >> *(bp::char_ - '"') >> '"'];

Without lexeme[], our string parser would correctly match "foo bar", but the generated attribute would be "foobar".

Creates a lexeme_parser.

skip[]

skip[] is like the inverse of lexeme[]. It enables skipping in the parse, even if it was not enabled before. For example, within a call to parse() that uses a skipper, let's say we have these parsers in use:

namespace bp = boost::parser;
auto const one_or_more = +bp::char_;
auto const skip_or_skip_not_there_is_no_try = bp::lexeme[bp::skip[one_or_more] >> one_or_more];

The use of lexeme[] disables skipping, but then the use of skip[] turns it back on. The net result is that the first occurrence of one_or_more will use the skipper passed to parse(); the second will not.

skip[] has another use. You can parameterize skip with a different parser to change the skipper just within the scope of the directive. Let's say we passed ws to parse(), and we're using these parsers somewhere within that parse() call:

namespace bp = boost::parser;
auto const zero_or_more = *bp::char_;
auto const skip_both_ways = zero_or_more >> bp::skip(bp::blank)[zero_or_more];

The first occurrence of zero_or_more will use the skipper passed to parse(), which is ws; the second will use blank as its skipper.

Creates a skip_parser.

merge[], separate[], and transform(f)[]

These directives influence the generation of attributes. See Attribute Generation section for more details on them.

merge[] and separate[] create a copy of the given seq_parser.

transform(f)[] creates a tranform_parser.


PrevUpHomeNext