PrevUpHomeNext

Parsing to Find Subranges

So far we've seen examples that parse some text and generate associated attributes. Sometimes, you want to find some subrange of the input that contains what you're looking for, and you don't want to generate attributes at all.

There are two directives that affect the attribute type of any parser, raw[] and string_view[]. (We'll get to directives in more detail in the Directives section later. For now, you just need to know that a directive wraps a parser, and changes some aspect of how it functions.)

raw[]

raw[] changes the attribute of its parser to be a subrange whose begin() and end() return the bounds of the sequence being parsed that match p.

namespace bp = boost::parser;
auto int_parser = bp::int_ % ',';            // ATTR(int_parser) is std::vector<int>
auto subrange_parser = bp::raw[int_parser];  // ATTR(subrange_parser) is a subrange

// Parse using int_parser, generating integers.
auto ints = bp::parse("1, 2, 3, 4", int_parser, bp::ws);
assert(ints);
assert(*ints == std::vector<int>({1, 2, 3, 4}));

// Parse again using int_parser, but this time generating only the
// subrange matched by int_parser.  (prefix_parse() allows matches that
// don't consume the entire input.)
auto const str = std::string("1, 2, 3, 4, a, b, c");
auto first = str.begin();
auto range = bp::prefix_parse(first, str.end(), subrange_parser, bp::ws);
assert(range);
assert(range->begin() == str.begin());
assert(range->end() == str.begin() + 10);

static_assert(std::is_same_v<
              decltype(range),
              std::optional<bp::subrange<std::string::const_iterator>>>);

Note that the subrange has the iterator type std::string::const_iterator, because that's the iterator type passed to prefix_parse(). If we had passed char const * iterators to prefix_parse(), that would have been the iterator type. The only exception to this comes from Unicode-aware parsing (see Unicode Support). In some of those cases, the iterator being used in the parse is not the one you passed. For instance, if you call prefix_parse() with char8_t * iterators, it will create a UTF-8 to UTF-32 transcoding view, and parse the iterators of that view. In such a case, you'll get a subrange whose iterator type is a transcoding iterator. When that happens, you can get the underlying iterator — the one you passed to prefix_parse() — by calling the .base() member function on each transcoding iterator in the returned subrange.

auto const u8str = std::u8string(u8"1, 2, 3, 4, a, b, c");
auto u8first = u8str.begin();
auto u8range = bp::prefix_parse(u8first, u8str.end(), subrange_parser, bp::ws);
assert(u8range);
assert(u8range->begin().base() == u8str.begin());
assert(u8range->end().base() == u8str.begin() + 10);
string_view[]

string_view[] has very similar semantics to raw[], except that it produces a std::basic_string_view<CharT> (where CharT is the type of the underlying range begin parsed) instead of a subrange. For this to work, the underlying range must be contiguous. Contiguity of iterators is not detectable before C++20, so this directive is only available in C++20 and later.

namespace bp = boost::parser;
auto int_parser = bp::int_ % ',';              // ATTR(int_parser) is std::vector<int>
auto sv_parser = bp::string_view[int_parser];  // ATTR(subrange_parser) is a string_view

auto const str = std::string("1, 2, 3, 4, a, b, c");
auto first = str.begin();
auto sv1 = bp::prefix_parse(first, str.end(), sv_parser, bp::ws);
assert(sv1);
assert(*sv1 == str.substr(0, 10));

static_assert(std::is_same_v<decltype(sv1), std::optional<std::string_view>>);

Since string_view[] produces string_views, it cannot return transcoding iterators as described above for raw[]. If you parse a sequence of CharT with string_view[], you get exactly a std::basic_string_view<CharT>. If the parse is using transcoding in the Unicode-aware path, string_view[] will decompose the transcoding iterator as necessary. If you pass a transcoding view to parse() or transcoding iterators to prefix_parse(), string_view[] will still see through the transcoding iterators without issue, and give you a string_view of part of the underlying range.

auto sv2 = bp::parse("1, 2, 3, 4" | bp::as_utf32, sv_parser, bp::ws);
assert(sv2);
assert(*sv2 == "1, 2, 3, 4");

static_assert(std::is_same_v<decltype(sv2), std::optional<std::string_view>>);

PrevUpHomeNext