A directive is an element of your parser that doesn't have any meaning by
itself. Some are second-order parsers that need a first-order parser to do
the actual parsing. Others influence the parse in some way. You can often
spot a directive lexically by its use of []
;
directives always []
. Non-directives
might, but only when attaching a semantic action.
The directives that are second order parsers are technically directives,
but since they are also used to create parsers, it is more useful just to
focus on that. The directives repeat()
and if_()
were already described in
the section on parsers; we won't say much about them here.
Sequence, alternative, and permutation parsers do not nest in most cases.
(Let's consider just sequence parsers to keep thinkgs simple, but most of
this logic applies to alternative parsers as well.) a
>> b
>> c
is the same as (a
>> b) >> c
and a
>> (b >> c)
, and
they are each represented by a single seq_parser
with three subparsers,
a
, b
,
and c
. However, if something
prevents two seq_parsers
from interacting directly, they will nest.
For instance, lexeme[a >> b] >>
c
is a seq_parser
containing two parsers,
lexeme[a >> b]
and
c
. This is because lexeme[]
takes its given parser and
wraps it in a lexeme_parser
. This in turn
turns off the sequence parser combining logic, since both sides of the second
operator>>
in lexeme[a >> b] >>
c
are not seq_parsers
. Sequence parsers
have several rules that govern what the overall attribute type of the parser
is, based on the positions and attributes of it subparsers (see Attribute
Generation). Therefore, it's important to know which directives create
a new parser (and what kind), and which ones do not; this is indicated for
each directive below.
See The
Parsers And Their Uses. Creates a repeat_parser
.
See The
Parsers And Their Uses. Creates a seq_parser
.
omit[p]
disables attribute generation for the parser p
.
Not only does omit[p]
have no attribute, but any attribute generation work that normally happens
within p
is skipped.
This directive can be useful in cases like this: say you have some fairly
complicated parser p
that
generates a large and expensive-to-construct attribute. Now say that you
want to write a function that just counts how many times p
can match a string (where the matches are non-overlapping). Instead of using
p
directly, and building
all those attributes, or rewriting p
without the attribute generation, use omit[]
.
Creates an omit_parser
.
raw[p]
changes the attribute from
to to a view that delimits the subrange of the input that was matched by
ATTR
(p)p
. The type of the view is
subrange<I>
,
where I
is the type of the
iterator used within the parse. Note that this may not be the same as the
iterator type passed to parse()
.
For instance, when parsing UTF-8, the iterator passed to parse()
may be char8_t const
*
, but within the parse it will be
a UTF-8 to UTF-32 transcoding (converting) iterator. Just like omit[]
, raw[]
causes all attribute-generation work within p
to be skipped.
Similar to the re-use scenario for omit[]
above, raw[]
could be used to find the
locations of all non-overlapping matches
of p
in a string.
Creates a raw_parser
.
string_view[p]
is very similar to raw[p]
, except
that it changes the attribute of p
to std::basic_string_view<C>
,
where C
is the character
type of the underlying range being parsed. string_view[]
requires that the underlying range being parsed is contiguous. Since this
can only be detected in C++20 and later, string_view[]
is not available in C++17 mode.
Similar to the re-use scenario for omit[]
above, string_view[]
could be used to find the
locations of all non-overlapping matches
of p
in a string. Whether
raw[]
or string_view[]
is more natural to use to report the locations depends on your use case,
but they are essentially the same.
Creates a string_view_parser
.
no_case[p]
enables case-insensitive parsing within the parse of p
.
This applies to the text parsed by char_()
,
string()
, and bool_
parsers. The number
parsers are already case-insensitive. The case-insensitivity is achieved
by doing Unicode case folding on the text being parsed and the values in
the parser being matched (see note below if you want to know more about Unicode
case folding). In the non-Unicode code path, a full Unicode case folding
is not done; instead, only the transformations of values less than 0x100
are done. Examples:
#include <boost/parser/transcode_view.hpp> // For as_utfN. namespace bp = boost::parser; auto const street_parser = bp::string(u8"Tobias Straße"); assert(!bp::parse("Tobias Strasse" | bp::as_utf32, street_parser)); // No match. assert(bp::parse("Tobias Strasse" | bp::as_utf32, bp::no_case[street_parser])); // Match! auto const alpha_parser = bp::no_case[bp::char_('a', 'z')]; assert(bp::parse("a" | bp::as_utf32, bp::no_case[alpha_parser])); // Match! assert(bp::parse("B" | bp::as_utf32, bp::no_case[alpha_parser])); // Match!
Everything pretty much does what you'd naively expect inside no_case[]
, except that the two-character
range version of char_
has
a limitation. It only compares a code point from the input to its two arguments
(e.g. 'a'
and 'z'
in the example above). It does not do anything special for multi-code point
case folding expansions. For instance, char_(U'ß', U'ß')
matches the input U"s"
, which makes sense, since U'ß'
expands
to U"ss"
.
However, that same parser does not match
the input U"ß"
!
In short, stick to pairs of code points that have single-code point case
folding expansions. If you need to support the multi-expanding code points,
use the other overload, like: char_(U"abcd/*...*/ß")
.
Note | |
---|---|
Unicode case folding is an operation that makes text uniformly one case,
and if you do it to two bits of text |
Creates a no_case_parser
.
lexeme[p]
disables use of the skipper, if a skipper is being used, within the parse
of p
. This is useful, for
instance, if you want to enable skipping in most parts of your parser, but
disable it only in one section where it doesn't belong. If you are skipping
whitespace in most of your parser, but want to parse strings that may contain
spaces, you should use lexeme[]
:
namespace bp = boost::parser; auto const string_parser = bp::lexeme['"' >> *(bp::char_ - '"') >> '"'];
Without lexeme[]
, our string parser would correctly
match "foo bar"
, but
the generated attribute would be "foobar"
.
Creates a lexeme_parser
.
skip[]
is like the inverse of lexeme[]
. It enables skipping in the
parse, even if it was not enabled before. For example, within a call to
parse()
that uses a skipper, let's
say we have these parsers in use:
namespace bp = boost::parser; auto const one_or_more = +bp::char_; auto const skip_or_skip_not_there_is_no_try = bp::lexeme[bp::skip[one_or_more] >> one_or_more];
The use of lexeme[]
disables skipping, but then
the use of skip[]
turns it back on. The net
result is that the first occurrence of one_or_more
will use the skipper passed to parse()
;
the second will not.
skip[]
has another use. You can parameterize
skip with a different parser to change the skipper just within the scope
of the directive. Let's say we passed ws
to parse()
,
and we're using these parsers somewhere within that parse()
call:
namespace bp = boost::parser; auto const zero_or_more = *bp::char_; auto const skip_both_ways = zero_or_more >> bp::skip(bp::blank)[zero_or_more];
The first occurrence of zero_or_more
will use the skipper passed to parse()
,
which is ws
;
the second will use blank
as its skipper.
Creates a skip_parser
.
transform(f)[]
These directives influence the generation of attributes. See Attribute Generation section for more details on them.
merge[]
and separate[]
create a copy of the given seq_parser
.
transform(f)[]
creates a tranform_parser
.