So far, we've seen several different types of attributes that come from different
parsers, int
for int_
,
boost::parser::tuple<char,
int>
for boost::parser::char_ >>
boost::parser::int_
, etc. Let's get into how this works
with more rigor.
Note | |
---|---|
Some parsers have no attribute at all. In the tables below, the type of
the attribute is listed as "None." There is a non- |
Warning | |
---|---|
Boost.Parser assumes that all attributes are semi-regular (see |
You can use attribute
(and the associated alias, attribute_t
) to determine the
attribute a parser would have if it were passed to parse()
.
Since at least one parser (char_
) has a polymorphic attribute
type, attribute
also takes the type of the range being parsed. If a parser produces no attribute,
attribute
will produce none
,
not void
.
If you want to feed an iterator/sentinel pair to attribute
, create a range from
it like so:
constexpr auto parser = /* ... */; auto first = /* ... */; auto const last = /* ... */; namespace bp = boost::parser; // You can of course use std::ranges::subrange directly in C++20 and later. using attr_type = bp::attribute_t<decltype(BOOST_PARSER_SUBRANGE(first, last)), decltype(parser)>;
There is no single attribute type for any parser, since a parser can be placed
within omit[]
, which makes its attribute
type none
.
Therefore, attribute
cannot tell you what attribute your parser will produce under all circumstances;
it only tells you what it would produce if it were passed to parse()
.
This table summarizes the attributes generated for all Boost.Parser parsers. In the table below:
RESOLVE
()
is a notional macro that expands to the resolution of parse argument
or evaluation of a parse predicate (see The
Parsers And Their Uses); and
x
and y
represent arbitrary objects.
Table 1.8. Parsers and Their Attributes
Parser |
Attribute Type |
Notes |
---|---|---|
None. |
||
None. |
||
None. |
||
|
|
|
The code point type in Unicode parsing, or |
Includes all the |
|
|
||
|
||
|
None. |
Includes all the |
|
|
Includes all the |
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
char_
is a bit odd, since its attribute type is polymorphic. When you use char_
to parse text in the non-Unicode code path (i.e. a string of char
), the attribute is char
.
When you use the exact same char_
to parse in the Unicode-aware
code path, all matching is code point based, and so the attribute type is
the type used to represent code points, char32_t
.
All parsing of UTF-8 falls under this case.
Here, we're parsing plain char
s,
meaning that the parsing is in the non-Unicode code path, the attribute of
char_
is char
:
auto result = parse("some text", boost::parser::char_); static_assert(std::is_same_v<decltype(result), std::optional<char>>));
When you parse UTF-8, the matching is done on a code point basis, so the
attribute type is char32_t
:
auto result = parse("some text" | boost::parser::as_utf8, boost::parser::char_); static_assert(std::is_same_v<decltype(result), std::optional<char32_t>>));
The good news is that usually you don't parse characters individually. When
you parse with char_
,
you usually parse repetition of then, which will produce a std::string
,
regardless of whether you're in Unicode parsing mode or not. If you do need
to parse individual characters, and want to lock down their attribute type,
you can use cp
and/or cu
to enforce a non-polymorphic attribute type.
Combining operations of course affect the generation of attributes. In the tables below:
m
and n
are parse arguments that resolve to integral values;
pred
is a parse predicate;
arg0
, arg1
,
arg2
, ... are parse arguments;
a
is a semantic action;
and
p
, p1
,
p2
, ... are parsers that
generate attributes.
Table 1.9. Combining Operations and Their Attributes
Parser |
Attribute Type |
---|---|
|
None. |
|
None. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
None. |
|
|
|
|
|
|
|
|
Important | |
---|---|
All the character parsers, like |
Important | |
---|---|
In case you did not notice it above, adding a semantic action to a parser
erases the parser's attribute. The attribute is still available inside
the semantic action as |
There are a relatively small number of rules that define how sequence parsers and alternative parsers' attributes are generated. (Don't worry, there are examples below.)
The attribute generation behavior of sequence parsers is conceptually pretty simple:
boost::parser::tuple<T>
(even if T
is a type
that means "no attribute"), the attribute becomes T
.
More formally, the attribute generation algorithm works like this. For a
sequence parser p
, let the
list of attribute types for the subparsers of p
be a0,
a1, a2, ...,
an
.
We get the attribute of p
by evaluating a compile-time left fold operation, left-fold({a1, a2, ..., an}, tuple<a0>, OP)
. OP
is the combining operation that takes the current attribute type (initially
boost::parser::tuple<a0>
) and the next attribute type, and returns
the new current attribute type. The current attribute type at the end of
the fold operation is the attribute type for p
.
OP
attempts to apply a series
of rules, one at a time. The rules are noted as X
>> Y
-> Z
,
where X
is the type of the
current attribute, Y
is the
type of the next attribute, and Z
is the new current attribute type. In these rules, C<T>
is a container of T
; none
is a special type that indicates that
there is no attribute; T
is a type; CHAR
is a character
type, either char
or char32_t
; and Ts...
is a parameter pack of one or more types.
Note that T
may be the special
type none
. The current attribute
is always a tuple (call it Tup
),
so the "current attribute X
"
refers to the last element of Tup
,
not Tup
itself, except for
those rules that explicitly mention boost::parser::tuple<>
as part of X
's type.
none >>
T ->
T
CHAR
>> CHAR
-> std::string
T >>
none ->
T
C<T> >> T
-> C<T>
T >>
C<T> -> C<T>
C<T> >> optional<T> -> C<T>
optional<T> >> C<T> -> C<T>
boost::parser::tuple<none> >>
T ->
boost::parser::tuple<T>
boost::parser::tuple<Ts...> >>
T ->
boost::parser::tuple<Ts..., T>
The rules that combine containers with (possibly optional) adjacent values
(e.g. C<T> >> optional<T>
-> C<T>
)
have a special case for strings. If C<T>
is exactly std::string
, and T
is either char
or char32_t
, the combination yields a std::string
.
Again, if the final result is that the attribute is boost::parser::tuple<T>
,
the attribute becomes T
.
Note | |
---|---|
What constitutes a container in the rules above is determined by the
template<typename T> concept container = std::ranges::common_range<T> && requires(T t) { { t.insert(t.begin(), *t.begin()) } -> std::same_as<std::ranges::iterator_t<T>>; };
|
The rules for alternative parsers are much simpler. For an alternative parer
p
, let the list of attribute
types for the subparsers of p
be a0,
a1, a2, ...,
an
. The attribute of p
is std::variant<a0, a1,
a2, ..., an>
, with the following steps applied:
none
attributes
are left out, and if any are, the attribute is wrapped in a std::optional
, like std::optional<std::variant</*...*/>>
;
std::variant
template parameters <T1, T2, ... Tn>
are removed; every type that appears
does so exacly once;
std::variant<T>
or std::optional<std::variant<T>>
, the attribute becomes instead
T
or std::optional<T>
, respectively; and
std::variant<>
or std::optional<std::variant<>>
, the result becomes none
instead.
The rule for forming containers from non-containers is simple. You get a
vector from any of the repeating parsers, like +p
, *p
, repeat(3)[p]
, etc.
The value type of the vector is
.
ATTR
(p)
Another rule for sequence containers is that a value x
and a container c
containing
elements of x
's type will
form a single container. However, x
's
type must be exactly the same as the elements in c
.
There is an exception to this in the special case for strings and characters
noted above. For instance, consider the attribute of char_
>> string("str")
. In the non-Unicode code path, char_
's attribute type is guaranteed to
be char
, so
is ATTR
(char_ >> string("str"))std::string
.
If you are parsing UTF-8 in the Unicode code path, char_
's
attribute type is char32_t
,
and the special rule makes it also produce a std::string
.
Otherwise, the attribute for
would be ATTR
(char_ >> string("str"))boost::parser::tuple<char32_t, std::string>
.
Again, there are no special rules for combining values and containers. Every combination results from an exact match, or fall into the string+character special case.
std::string
assignment
std::string
can be assigned from a char
. This is dumb. But, we're stuck with
it. When you write a parser with a char
attribute, and you try to parse it into a std::string
,
you've almost certainly made a mistake. More importantly, if you write this:
namespace bp = boost::parser; std::string result; auto b = bp::parse("3", bp::int_, bp::ws, result);
... you are even more likely to have made a mistake. Though this should work,
because the assignment in std::string s; s
= 3;
is well-formed, Boost.Parser forbids it.
If you write parsing code like the snippet above, you will get a static assertion.
If you really do want to assign a float
or whatever to a std::string
, do it in a semantic action.
In the table: a
is a semantic
action; and p
, p1
, p2
,
... are parsers that generate attributes. Note that only >>
is used here; >
has the exact
same attribute generation rules.
Table 1.10. Sequence and Alternative Combining Operations and Their Attributes
Expression |
Attribute Type |
---|---|
None. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
None. |
|
|
|
|
|
|
|
|
|
|
As we saw in the previous Parsing
into struct
s and class
es section, if you parse two strings
in a row, you get two separate strings in the resulting attribute. The parser
from that example was this:
namespace bp = boost::parser; auto employee_parser = bp::lit("employee") >> '{' >> bp::int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> bp::double_ >> '}';
employee_parser
's attribute
is boost::parser::tuple<int,
std::string, std::string, double>
.
The two quoted_string
parsers
produce std::string
attributes, and those attributes
are not combined. That is the default behavior, and it is just what we want
for this case; we don't want the first and last name fields to be jammed
together such that we can't tell where one name ends and the other begins.
What if we were parsing some string that consisted of a prefix and a suffix,
and the prefix and suffix were defined separately for reuse elsewhere?
namespace bp = boost::parser; auto prefix = /* ... */; auto suffix = /* ... */; auto special_string = prefix >> suffix; // Continue to use prefix and suffix to make other parsers....
In this case, we might want to use these separate parsers, but want special_string
to produce a single std::string
for its attribute. merge[]
exists for this purpose.
namespace bp = boost::parser; auto prefix = /* ... */; auto suffix = /* ... */; auto special_string = bp::merge[prefix >> suffix];
merge[]
only applies to sequence parsers
(like p1 >>
p2
), and forces all subparsers
in the sequence parser to use the same variable for their attribute.
Another directive, separate[]
,
also applies only to sequence parsers, but does the opposite of merge[]
. If forces all the attributes
produced by the subparsers of the sequence parser to stay separate, even
if they would have combined. For instance, consider this parser.
namespace bp = boost::parser; auto string_and_char = +bp::char_('a') >> ' ' >> bp::cp;
string_and_char
matches one
or more 'a'
s, followed by some
other character. As written above, string_and_char
produces a std::string
, and the final character is appended
to the string, after all the 'a'
s.
However, if you wanted to store the final character as a separate value,
you would use separate[]
.
namespace bp = boost::parser; auto string_and_char = bp::separate[+bp::char_('a') >> ' ' >> bp::cp];
With this change, string_and_char
produces the attribute boost::parser::tuple<std::string, char32_t>
.
As mentioned previously, merge[]
applies only to sequence parsers. All subparsers must have the same attribute,
or produce no attribute at all. At least one subparser must produce an attribute.
When you use merge[]
, you create a combining
group. Every parser in a combining group uses the same variable
for its attribute. No parser in a combining group interacts with the attributes
of any parsers outside of its combining group. Combining groups are disjoint;
merge[/*...*/]
>> merge[/*...*/]
will produce a tuple of two attributes,
not one.
separate[]
also applies only to sequence
parsers. When you use separate[]
,
you disable interaction of all the subparsers' attributes with adjacent attributes,
whether they are inside or outside the separate[]
directive; you force each subparser to have a separate attribute.
The rules for merge[]
and separate[]
overrule the steps of the algorithm described above for combining the attributes
of a sequence parser. Consider an example.
namespace bp = boost::parser; constexpr auto parser = bp::char_ >> bp::merge[(bp::string("abc") >> bp::char_ >> bp::char_) >> bp::string("ghi")];
You might think that
would be ATTR
(parser)bp::tuple<char,
std::string>
.
It is not. The parser above does not even compile. Since we created a merge
group above, we disabled the default behavior in which the char_
parsers would have collapsed into
the string
parser that preceded
them. Since they are all treated as separate entities, and since they have
different attribute types, the use of merge[]
is an error.
Many directives create a new parser out of the parser they are given. merge[]
and separate[]
do not. Since they operate only on sequence parsers, all they do is create
a copy of the sequence parser they are given. The seq_parser
template has a template
parameter CombiningGroups
,
and all merge[]
and separate[]
do is take a given seq_parser
and create a copy
of it with a different CombiningGroups
template parameter. This means that merge[]
and separate[]
are can be ignored in operator>>
expressions much like parentheses are. Consider an example.
namespace bp = boost::parser; constexpr auto parser1 = bp::separate[bp::int_ >> bp::int_] >> bp::int_; constexpr auto parser2 = bp::lexeme[bp::int_ >> ' ' >> bp::int_] >> bp::int_;
Note that separate[]
is a no-op here; it's only
being used this way for this example. These parsers have different attribute
types.
is ATTR
(parser1)boost::parser::tuple(int,
int, int)
.
is ATTR
(parser2)boost::parser::tuple(boost::parser::tuple(int,
int), int)
. This
is because bp::lexeme[]
wraps its given parser in a new parser. merge[]
does not. That's why, even though parser1
and parser2
look so structurally
similar, they have different attributes.
transform(f)[]
transform(f)[]
is a directive that transforms the attribute of a parser using the given
function f
. For example:
auto str_sum = [&](std::string const & s) { int retval = 0; for (auto ch : s) { retval += ch - '0'; } return retval; }; namespace bp = boost::parser; constexpr auto parser = +bp::char_; std::string str = "012345"; auto result = bp::parse(str, bp::transform(str_sum)[parser]); assert(result); assert(*result == 15); static_assert(std::is_same_v<decltype(result), std::optional<int>>);
Here, we have a function str_sum
that we use for f
. It assumes
each character in the given std::string
s
is a digit, and returns
the sum of all the digits in s
.
Out parser parser
would normally
return a std::string
. However, since str_sum
returns a different type — int
— that is the attribute type of the full parser, bp::transform(by_value_str_sum)[parser]
, as you can see from the static_assert
.
As is the case with attributes all throughout Boost.Parser, the attribute
passed to f
will be moved.
You can take it by const &
,
&&
, or by value.
No distinction is made between parsers with and without an attribute, because
there is a Regular special no-attribute type that is generated by parsers
with no attribute. You may therefore write something like transform(f)[eps]
, and Boost.Parser will happily call f
with this special no-attribute type.
omit[p]
disables attribute generation for the parser p
.
raw[p]
changes the attribute from
to a view that indicates the subrange of the input that was matched by ATTR
(p)p
. string_view[p]
is just
like raw[p]
,
except that it produces std::basic_string_view
s.
See Directives for
details.