So far we've seen examples that parse some text and generate associated attributes. Sometimes, you want to find some subrange of the input that contains what you're looking for, and you don't want to generate attributes at all.
There are two directives that affect the attribute type
of any parser, raw[]
and string_view[]
.
(We'll get to directives in more detail in the Directives
section later. For now, you just need to know that a directive wraps a parser,
and changes some aspect of how it functions.)
raw[]
changes the attribute of its
parser to be a subrange
whose begin()
and end()
return the bounds of the sequence being parsed that match p
.
namespace bp = boost::parser; auto int_parser = bp::int_ % ','; // ATTR(int_parser) is std::vector<int> auto subrange_parser = bp::raw[int_parser]; // ATTR(subrange_parser) is a subrange // Parse using int_parser, generating integers. auto ints = bp::parse("1, 2, 3, 4", int_parser, bp::ws); assert(ints); assert(*ints == std::vector<int>({1, 2, 3, 4})); // Parse again using int_parser, but this time generating only the // subrange matched by int_parser. (prefix_parse() allows matches that // don't consume the entire input.) auto const str = std::string("1, 2, 3, 4, a, b, c"); auto first = str.begin(); auto range = bp::prefix_parse(first, str.end(), subrange_parser, bp::ws); assert(range); assert(range->begin() == str.begin()); assert(range->end() == str.begin() + 10); static_assert(std::is_same_v< decltype(range), std::optional<bp::subrange<std::string::const_iterator>>>);
Note that the subrange
has the iterator type std::string::const_iterator
,
because that's the iterator type passed to prefix_parse()
.
If we had passed char const
*
iterators to prefix_parse()
,
that would have been the iterator type. The only exception to this comes
from Unicode-aware parsing (see Unicode
Support). In some of those cases, the iterator being used in the parse
is not the one you passed. For instance, if you call prefix_parse()
with char8_t *
iterators, it will create a UTF-8 to UTF-32 transcoding view, and parse the
iterators of that view. In such a case, you'll get a subrange
whose iterator type
is a transcoding iterator. When that happens, you can get the underlying
iterator — the one you passed to prefix_parse()
— by calling the .base()
member function on each transcoding iterator
in the returned subrange
.
auto const u8str = std::u8string(u8"1, 2, 3, 4, a, b, c"); auto u8first = u8str.begin(); auto u8range = bp::prefix_parse(u8first, u8str.end(), subrange_parser, bp::ws); assert(u8range); assert(u8range->begin().base() == u8str.begin()); assert(u8range->end().base() == u8str.begin() + 10);
string_view[]
has very similar semantics
to raw[]
, except that it produces a
std::basic_string_view<CharT>
(where CharT
is the type
of the underlying range begin parsed) instead of a subrange
. For this to work,
the underlying range must be contiguous. Contiguity of iterators is not detectable
before C++20, so this directive is only available in C++20 and later.
namespace bp = boost::parser; auto int_parser = bp::int_ % ','; // ATTR(int_parser) is std::vector<int> auto sv_parser = bp::string_view[int_parser]; // ATTR(subrange_parser) is a string_view auto const str = std::string("1, 2, 3, 4, a, b, c"); auto first = str.begin(); auto sv1 = bp::prefix_parse(first, str.end(), sv_parser, bp::ws); assert(sv1); assert(*sv1 == str.substr(0, 10)); static_assert(std::is_same_v<decltype(sv1), std::optional<std::string_view>>);
Since string_view[]
produces string_view
s,
it cannot return transcoding iterators as described above for raw[]
. If you parse a sequence of
CharT
with string_view[]
,
you get exactly a std::basic_string_view<CharT>
.
If the parse is using transcoding in the Unicode-aware path, string_view[]
will decompose the transcoding
iterator as necessary. If you pass a transcoding view to parse()
or transcoding iterators to prefix_parse()
,
string_view[]
will still see through the
transcoding iterators without issue, and give you a string_view
of part of the underlying range.
auto sv2 = bp::parse("1, 2, 3, 4" | bp::as_utf32, sv_parser, bp::ws); assert(sv2); assert(*sv2 == "1, 2, 3, 4"); static_assert(std::is_same_v<decltype(sv2), std::optional<std::string_view>>);