There are multiple top-level parse functions. They have some things in common:
bool
.
char
,
wchar_t
, char8_t
,
char16_t
, or char32_t
.
prefix_
in their name take an iterator/sentinel pair. For example prefix_parse(first, last, p, ws)
,
which parses the range [first, last)
,
advancing first
as it
goes. If the parse succeeds, the entire input may or may not have been
matched. The value of first
will indicate the last location within the input that p
matched. The whole input was matched
if and only if first == last
after the call to parse()
.
parse()
,
for example parse(r, p, ws)
, parse()
only indicates success if all of r
was matched by p
.
Note | |
---|---|
|
There are eight overloads of parse()
and prefix_parse()
combined, because there
are three either/or options in how you call them.
You can call prefix_parse()
with an iterator and sentinel that delimit a range of character values. For
example:
namespace bp = boost::parser; auto const p = /* some parser ... */; char const * str_1 = /* ... */; // Using null_sentinel, str_1 can point to three billion characters, and // we can call prefix_parse() without having to find the end of the string first. auto result_1 = bp::prefix_parse(str_1, bp::null_sentinel, p, bp::ws); char str_2[] = /* ... */; auto result_2 = bp::prefix_parse(std::begin(str_2), std::end(str_2), p, bp::ws);
The iterator/sentinel overloads can parse successfully without matching the
entire input. You can tell if the entire input was matched by checking if
first ==
last
is true after prefix_parse()
returns.
By contrast, you call parse()
with a range of character values. When the range is a reference to an array
of characters, any terminating 0
is ignored; this allows calls like parse("str",
p)
to work naturally.
namespace bp = boost::parser; auto const p = /* some parser ... */; std::u8string str_1 = "str"; auto result_1 = bp::parse(str_1, p, bp::ws); // The null terminator is ignored. This call parses s-t-r, not s-t-r-0. auto result_2 = bp::parse(U"str", p, bp::ws); char const * str_3 = "str"; auto result_3 = bp::parse(bp::null_term(str_3) | bp::as_utf16, p, bp::ws);
Since there is no way to indicate that p
matches the input, but only a prefix of the input was matched, the range
(non-iterator/sentinel) overloads of parse()
indicate failure if the entire input is not matched.
namespace bp = boost::parser; auto const p = '"' >> *(bp::char_ - '"') >> '"'; char const * str = "\"two words\"" ; std::string result_1; bool const success = bp::parse(str, p, result_1); // success is true; result_1 is "two words" auto result_2 = bp::parse(str, p); // !!result_2 is true; *result_2 is "two words"
When you call parse()
with
an attribute out-parameter and parser p
,
the expected type is something like
.
It doesn't have to be exactly that; I'll explain in a bit. The return type
is ATTR
(p)bool
.
When you call parse()
without
an attribute out-parameter and parser p
,
the return type is std::optional<
.
Note that when ATTR
(p)>
is itself an ATTR
(p)optional
, the
return type is std::optional<std::optional<...>>
. Each of those optionals tells
you something different. The outer one tells you whether the parse succeeded.
If so, the parser was successful, but it still generates an attribute that
is an optional
— that's
the inner one.
namespace bp = boost::parser; auto const p = '"' >> *(bp::char_ - '"') >> '"'; char const * str = "\"two words\"" ; auto result_1 = bp::parse(str, p); // !!result_1 is true; *result_1 is "two words" auto result_2 = bp::parse(str, p, bp::ws); // !!result_2 is true; *result_2 is "twowords"
For any call to parse()
that takes an attribute
out-parameter, like parse("str",
p, bp::ws, out)
,
the call is well-formed for a number of possible types of out
;
decltype(out)
does
not need to be exactly
.
ATTR
(p)
For instance, this is well-formed code that does not abort (remember that
the attribute type of string()
is std::string
):
namespace bp = boost::parser; auto const p = bp::string("foo"); std::vector<char> result; bool const success = bp::parse("foo", p, result); assert(success && result == std::vector<char>({'f', 'o', 'o'}));
Even though p
generates a
std::string
attribute, when it actually takes
the data it generates and writes it into an attribute, it only assumes that
the attribute is a container
(see Concepts), not that it
is some particular container type. It will happily insert()
into a std::string
or a std::vector<char>
all
the same. std::string
and std::vector<char>
are both containers of char
,
but it will also insert into a container with a different element type.
p
just needs to be able to
insert the elements it produces into the attribute-container. As long as
an implicit conversion allows that to work, everything is fine:
namespace bp = boost::parser; auto const p = bp::string("foo"); std::deque<int> result; bool const success = bp::parse("foo", p, result); assert(success && result == std::deque<int>({'f', 'o', 'o'}));
This works, too, even though it requires inserting elements from a generated
sequence of char32_t
into a
container of char
(remember
that the attribute type of +cp
is std::vector<char32_t>
):
namespace bp = boost::parser; auto const p = +bp::cp; std::string result; bool const success = bp::parse("foo", p, result); assert(success && result == "foo");
This next example works as well, even though the change to a container is not at the top level. It is an element of the result tuple:
namespace bp = boost::parser; auto const p = +(bp::cp - ' ') >> ' ' >> string("foo"); using attr_type = decltype(bp::parse(u8"", p)); static_assert(std::is_same_v< attr_type, std::optional<bp::tuple<std::string, std::string>>>); using namespace bp::literals; { // This is similar to attr_type, with the first std::string changed to a std::vector<int>. bp::tuple<std::vector<int>, std::string> result; bool const success = bp::parse(u8"rôle foo" | bp::as_utf8, p, result); assert(success); assert(bp::get(result, 0_c) == std::vector<int>({'r', U'ô', 'l', 'e'})); assert(bp::get(result, 1_c) == "foo"); } { // This time, we have a std::vector<char> instead of a std::vector<int>. bp::tuple<std::vector<char>, std::string> result; bool const success = bp::parse(u8"rôle foo" | bp::as_utf8, p, result); assert(success); // The 4 code points "rôle" get transcoded to 5 UTF-8 code points to fit in the std::string. assert(bp::get(result, 0_c) == std::vector<char>({'r', (char)0xc3, (char)0xb4, 'l', 'e'})); assert(bp::get(result, 1_c) == "foo"); }
As indicated in the inline comments, there are a couple of things to take away from this example:
std::string
to std::vector<int>
,
or std::vector<char32_t>
to std::deque<int>
),
the call to parse()
will often still be
well-formed.
char32_t
(or wchar_t
for non-MSVC
builds), and the new container's element type is char
or char8_t
, Boost.Parser
assumes that this is a UTF-32-to-UTF-8 conversion, and silently transcodes
the data when inserting into the new container.
Let's look at a case where another simple-seeming type replacement does not work. First, the case that works:
namespace bp = boost::parser; auto parser = -(bp::char_ % ','); std::vector<int> result; auto b = bp::parse("a, b", parser, bp::ws, result);
is ATTR
(parser)std::optional<std::string>
. Even though we pass a std::vector<int>
,
everything is fine. However, if we modify this case only sightly, so that
the std::optional<std::string>
is nested within the attribute, the code
becomes ill-formed.
struct S { std::vector<int> chars; int i; }; namespace bp = boost::parser; auto parser = -(bp::char_ % ',') >> bp::int_; S result; auto b = bp::parse("a, b 42", parser, bp::ws, result);
If we change chars
to a
std::vector<char>
,
the code is still ill-formed. Same if we change chars
to a std::string
. We must actually use std::optional<std::string>
exactly to make the code well-formed
again.
The reason the same looseness from the top-level parser does not apply to
a nested parser is that, at some point in the code, the parser -(bp::char_ % ',')
would try
to assign a std::optional<std::string>
— the element type of the attribute
type it normally generates — to a chars
.
If there's no implicit conversion there, the code is ill-formed.
The take-away for this last example is that the ability to arbitrarily swap
out data types within the type of the attribute you pass to parse()
is very flexible, but is
also limited to structurally simple cases. When we discuss rules
in the next section,
we'll see how this flexibility in the types of attributes can help when writing
complicated parsers.
Those were examples of swapping out one container type for another. They
make good examples because that is more likely to be surprising, and so it's
getting lots of coverage here. You can also do much simpler things like parse
using a uint_
,
and writing its attribute into a double
.
In general, you can swap any type T
out of the attribute, as long as the swap would not result in some ill-formed
assignment within the parse.
Here is another example that also produces surprising results, for a different reason.
namespace bp = boost::parser; constexpr auto parser = bp::char_('a') >> bp::char_('b') >> bp::char_('c') | bp::char_('x') >> bp::char_('y') >> bp::char_('z'); std::string str = "abc"; bp::tuple<char, char, char> chars; bool b = bp::parse(str, parser, chars); assert(b); assert(chars == bp::tuple('c', '\0', '\0'));
This looks wrong, but is expected behavior. At every stage of the parse that
produces an attribute, Boost.Parser tries to assign that attribute to some
part of the out-param attribute provided to parse()
,
if there is one. Note that
is ATTR
(parser)std::string
,
because each sequence parser is three char_
parsers in a row, which forms a std::string
;
there are two such alternatives, so the overall attribute is also std::string
.
During the parse, when the first parser bp::char_('a')
matches the input, it produces the attribute 'a'
and needs to assign it to its destination. Some logic inside the sequence
parser indicates that this 'a'
contributes to the value in the 0
th
position in the result tuple, if the result is being written into a tuple.
Here, we passed a bp::tuple<char, char, char>
,
so it writes 'a'
into the first
element. Each subsequent char_
parser does the same thing, and writes over the first element. If we had
passed a std::string
as the out-param instead, the logic
would have seen that the out-param attribute is a string, and would have
appended 'a'
to it. Then each subsequent
parser would have appended to the string.
Boost.Parser never looks at the arity of the tuple passed to parse()
to see if there are too
many or too few elements in it, compared to the expected attribute for the
parser. In this case, there are two extra elements that are never touched.
If there had been too few elements in the tuple, you would have seen a compilation
error. The reason that Boost.Parser never does this kind of type-checking
up front is that the loose assignment logic is spread out among the individual
parsers; the top-level parse can determine what the expected attribute is,
but not whether a passed attribute of another type is a suitable stand-in.
variant
attribute out-parameters
The use of a variant in an out-param is compatible if the default attribute
can be assigned to the variant
.
No other work is done to make the assignment compatible. For instance, this
will work as you'd expect:
namespace bp = boost::parser; std::variant<int, double> v; auto b = bp::parse("42", bp::int_, v); assert(b); assert(v.index() == 0); assert(std::get<0>(v) == 42);
Again, this works because v = 42
is well-formed.
However, other kinds of substitutions will not work. In particular, the
boost::parser::tuple
to aggregate or aggregate to boost::parser::tuple
transformations will
not work. Here's an example.
struct key_value { int key; double value; }; namespace bp = boost::parser; std::variant<key_value, double> kv_or_d; key_value kv; bp::parse("42 13.0", bp::int_ >> bp::double_, kv); // Ok. bp::parse("42 13.0", bp::int_ >> bp::double_, kv_or_d); // Error: ill-formed!
In this case, it would be easy for Boost.Parser to look at the alternative types covered by the variant, and do a conversion. However, there are many cases in which there is no obviously correct variant alternative type, or in which the user might expect one variant alternative type and get another. Consider a couple of cases.
struct i_d { int i; double d; }; struct d_i { double d; int i; }; using v1 = std::variant<i_d, d_i>; struct i_s { int i; short s; }; struct d_d { double d1; double d2; }; using v2 = std::variant<i_s, d_d>; using tup_t = boost::parser::tuple<short, short>;
If we have a parser that produces a tup_t
,
and we have a v1
attribute
out-param, the correct variant alternative type clearly does not exist —
this case is ambiguous, and anyone can see that neither variant alternative
is a better match. If we were assigning a tup_t
to v2
, it's even worse. The
same ambiguity exists, but to the user, i_s
is clearly "closer" than d_d
.
So, Boost.Parser only does assignment. If some parser P
generates a default attribute that is not assignable to a variant alternative
that you want to assign it to, you can just create a rule
that creates either an
exact variant alternative type, or the variant itself, and use P
as your rule's parser.
A call to parse()
either considers the entire
input to be in a UTF format (UTF-8, UTF-16, or UTF-32), or it considers the
entire input to be in some unknown encoding. Here is how it deduces which
case the call falls under:
char8_t
,
or if the input is a boost::parser::utf8_view
,
the input is UTF-8.
char
,
the input is in an unknown encoding.
Tip | |
---|---|
if you want to want to parse in ASCII-only mode, or in some other non-Unicode
encoding, use only sequences of |
Tip | |
---|---|
If you want to ensure all input is parsed as Unicode, pass the input range
|
Note | |
---|---|
Since passing |
trace_mode
parameter to
parse()
Debugging parsers is notoriously difficult once they reach a certain size.
To get a verbose trace of your parse, pass boost::parser::trace::on
as the final parameter to parse()
. It will show you the current
parser being matched, the next few characters to be parsed, and any attributes
generated. See the Error
Handling and Debugging section of the tutorial for details.
Each call to parse()
can optionally have a globals
object associated with it. To use a particular globals object with you parser,
you call with_globals()
to create a new parser with
the globals object in it:
struct globals_t { int foo; std::string bar; }; auto const parser = /* ... */; globals_t globals{42, "yay"}; auto result = boost::parser::parse("str", boost::parser::with_globals(parser, globals));
Every semantic action within that call to parse()
can access the same globals_t
object using _globals(ctx)
.
The default error handler is great for most needs, but if you want to change
it, you can do so by creating a new parser with a call to with_error_handler()
:
auto const parser = /* ... */; my_error_handler error_handler; auto result = boost::parser::parse("str", boost::parser::with_error_handler(parser, error_handler));
Tip | |
---|---|
If your parsing environment does not allow you to report errors to a terminal,
you may want to use |
Important | |
---|---|
Globals and the error handler are ignored, if present, on any parser except the top-level parser. |