PrevUpHomeNext

Algorithms and Views That Use Parsers

Unless otherwise noted, all the algorithms and views are constrained very much like the way the parse() overloads are. The kinds of ranges, parsers, etc., that they accept are the same.

boost::parser::search()

As shown in The parse() API, the two patterns of parsing in Boost.Parser are whole-parse and prefix-parse. When you want to find something in the middle of the range being parsed, there's no parse API for that. You can of course make a simple parser that skips everything before what you're looking for.

namespace bp = boost::parser;
constexpr auto parser = /* ... */;
constexpr auto middle_parser = bp::omit[*(bp::char_ - parser)] >> parser;

middle_parser will skip over everything, one char_ at a time, as long as the next char_ is not the beginning of a successful match of parser. After this, control passes to parser itself. Ok, so that's not too hard to write. If you need to parse something from the middle in order to generate attributes, this is what you should use.

However, it often turns out you only need to find some subrange in the parsed range. In these cases, it would be nice to turn this into a proper algorithm in the pattern of the ones in std::ranges, since that's more idiomatic. boost::parser::search() is that algorithm. It has very similar semantics to std::ranges::search, except that it searches not for a match to an exact subrange, but to a match with the given parser. Like std::ranges::search(), it returns a subrange (boost::parser::subrange in C++17, std::ranges::subrange in C++20 and later).

namespace bp = boost::parser;
auto result = bp::search("aaXYZq", bp::lit("XYZ"), bp::ws);
assert(!result.empty());
assert(std::string_view(result.begin(), result.end() - result.begin()) == "XYZ");

Since boost::parser::search() returns a subrange, whatever parser you give it produces no attribute. I wrote bp::lit("XYZ") above; if I had written bp::string("XYZ") instead, the result (and lack of std::string construction) would not change.

As you can see above, one aspect of boost::parser::search() differs intentionally from the conventions of the std::ranges algorithms — it accepts C-style strings, treating them as if they were proper ranges.

Also, boost::parser::search() knows how to accommodate your iterator type. You can pass the C-style string "aaXYZq" as in the example above, or "aaXYZq" | bp::as_utf32, or "aaXYZq" | bp::as_utf8, or even "aaXYZq" | bp::as_utf16, and it will return a subrange whose iterators are the type that you passed as input, even though internally the iterator type might be something different (a UTF-8 -> UTF-32 transcoding iterator in Unicode parsing, as with all the | bp::as_utfN examples above). As long as you pass a range to be parsed whose value type is char, char8_t, char32_t, or that is adapted using some combination of as_utfN adaptors, this accommodation will operate correctly.

boost::parser::search() has multiple overloads. You can pass a range or an iterator/sentinel pair, and you can pass a skip parser or not. That's four overloads. Also, all four overloads take an optional boost::parser::trace parameter at the end. This is really handy for investigating why you're not finding something in the input that you expected to.

boost::parser::search_all

boost::parser::search_all creates boost::parser::search_all_views. boost::parser::search_all_view is a std::views-style view. It produces a range of subranges. Each subrange it produces is the next match of the given parser in the parsed range.

namespace bp = boost::parser;
auto r = "XYZaaXYZbaabaXYZXYZ" | bp::search_all(bp::lit("XYZ"));
int count = 0;
// Prints XYZ XYZ XYZ XYZ.
for (auto subrange : r) {
    std::cout << std::string_view(subrange.begin(), subrange.end() - subrange.begin()) << " ";
    ++count;
}
std::cout << "\n";
assert(count == 4);

All the details called out in the subsection on boost::parser::search() above apply to boost::parser::search_all: its parser produces no attributes; it accepts C-style strings as if they were ranges; and it knows how to get from the internally-used iterator type back to the given iterator type, in typical cases.

boost::parser::search_all can be called with, and boost::parser::search_all_view can be constructed with, a skip parser or not, and you can always pass boost::parser::trace at the end of any of their overloads.

boost::parser::split

boost::parser::split creates boost::parser::split_views. boost::parser::split_view is a std::views-style view. It produces a range of subranges of the parsed range split on matches of the given parser. You can think of boost::parser::split_view as being the complement of boost::parser::search_all_view, in that boost::parser::split_view produces the subranges between the subranges produced by boost::parser::search_all_view. boost::parser::split_view has very similar semantics to std::views::split_view. Just like std::views::split_view, boost::parser::split_view will produce empty ranges between the beginning/end of the parsed range and an adjacent match, or between adjacent matches.

namespace bp = boost::parser;
auto r = "XYZaaXYZbaabaXYZXYZ" | bp::split(bp::lit("XYZ"));
int count = 0;
// Prints '' 'aa' 'baaba' '' ''.
for (auto subrange : r) {
    std::cout << "'" << std::string_view(subrange.begin(), subrange.end() - subrange.begin()) << "' ";
    ++count;
}
std::cout << "\n";
assert(count == 5);

All the details called out in the subsection on boost::parser::search() above apply to boost::parser::split: its parser produces no attributes; it accepts C-style strings as if they were ranges; and it knows how to get from the internally-used iterator type back to the given iterator type, in typical cases.

boost::parser::split can be called with, and boost::parser::split_view can be constructed with, a skip parser or not, and you can always pass boost::parser::trace at the end of any of their overloads.

boost::parser::replace
[Important] Important

boost::parser::replace and boost::parser::replace_view are not available on MSVC in C++17 mode.

boost::parser::replace creates boost::parser::replace_views. boost::parser::replace_view is a std::views-style view. It produces a range of subranges from the parsed range r and the given replacement range replacement. Wherever in the parsed range a match to the given parser parser is found, replacement is the subrange produced. Each subrange of r that does not match parser is produced as a subrange as well. The subranges are produced in the order in which they occur in r. Unlike boost::parser::split_view, boost::parser::replace_view does not produce empty subranges, unless replacement is empty.

namespace bp = boost::parser;
auto card_number = bp::int_ >> bp::repeat(3)['-' >> bp::int_];
auto rng = "My credit card number is 1234-5678-9012-3456." | bp::replace(card_number, "XXXX-XXXX-XXXX-XXXX");
int count = 0;
// Prints My credit card number is XXXX-XXXX-XXXX-XXXX.
for (auto subrange : rng) {
    std::cout << std::string_view(subrange.begin(), subrange.end() - subrange.begin());
    ++count;
}
std::cout << "\n";
assert(count == 3);

If the iterator types Ir and Ireplacement for the r and replacement ranges passed are identical (as in the example above), the iterator type for the subranges produced is Ir. If they are different, an implementation-defined type is used for the iterator. This type is the moral equivalent of a std::variant<Ir, Ireplacement>. This works as long as Ir and Ireplacement are compatible. To be compatible, they must have common reference, value, and rvalue reference types, as determined by std::common_type_t. One advantage to this scheme is that the range of subranges represented by boost::parser::replace_view is easily joined back into a single range.

namespace bp = boost::parser;
auto card_number = bp::int_ >> bp::repeat(3)['-' >> bp::int_];
auto rng = "My credit card number is 1234-5678-9012-3456." | bp::replace(card_number, "XXXX-XXXX-XXXX-XXXX") | std::views::join;
std::string replace_result;
for (auto ch : rng) {
    replace_result.push_back(ch);
}
assert(replace_result == "My credit card number is XXXX-XXXX-XXXX-XXXX.");

Note that we could not have written std::string replace_result(r.begin(), r.end()). This is ill-formed because the std::string range constructor takes two iterators of the same type, but decltype(rng.end()) is a sentinel type different from decltype(rng.begin()).

Though the ranges r and replacement can both be C-style strings, boost::parser::replace_view must know the end of replacement before it does any work. This is because the subranges produced are all common ranges, and so if replacement is not, a common range must be formed from it. If you expect to pass very long C-style strings to boost::parser::replace and not pay to see the end until the range is used, don't.

ReplacementV is constrained almost exactly the same as V. V must model parsable_range and std::ranges::viewable_range. ReplacementV is the same, except that it can also be a std::ranges::input_range, whereas V must be a std::ranges::forward_range.

You may wonder what happens when you pass a UTF-N range for r, and a UTF-M range for replacement. What happens in this case is silent transcoding of replacement from UTF-M to UTF-N by the boost::parser::replace range adaptor. This doesn't require memory allocation; boost::parser::replace just slaps | boost::parser::as_utfN onto replacement. However, since Boost.Parser treats char ranges as unknown encoding, boost::parser::replace will not transcode from char ranges. So calls like this won't work:

char const str[] = "some text";
char const replacement_str[] = "some text";
using namespace bp = boost::parser;
auto r = empty_str | bp::replace(parser, replacement_str | bp::as_utf8); // Error: ill-formed!  Can't mix plain-char inputs and UTF replacements.

This does not work, even though char and UTF-8 are the same size. If r and replacement are both ranges of char, everything will work of course. It's just mixing char and UTF-encoded ranges that does not work.

All the details called out in the subsection on boost::parser::search() above apply to boost::parser::replace: its parser produces no attributes; it accepts C-style strings for the r and replacement parameters as if they were ranges; and it knows how to get from the internally-used iterator type back to the given iterator type, in typical cases.

boost::parser::replace can be called with, and boost::parser::replace_view can be constructed with, a skip parser or not, and you can always pass boost::parser::trace at the end of any of their overloads.

boost::parser::transform_replace
[Important] Important

boost::parser::transform_replace and boost::parser::transform_replace_view are not available on MSVC in C++17 mode.

[Important] Important

boost::parser::transform_replace and boost::parser::transform_replace_view are not available on GCC in C++20 mode before GCC 12.

boost::parser::transform_replace creates boost::parser::transform_replace_views. boost::parser::transform_replace_view is a std::views-style view. It produces a range of subranges from the parsed range r and the given invocable f. Wherever in the parsed range a match to the given parser parser is found, let parser's attribute be attr; f(std::move(attr)) is the subrange produced. Each subrange of r that does not match parser is produced as a subrange as well. The subranges are produced in the order in which they occur in r. Unlike boost::parser::split_view, boost::parser::transform_replace_view does not produce empty subranges, unless f(std::move(attr)) is empty. Here is an example.

auto string_sum = [](std::vector<int> const & ints) {
    return std::to_string(std::accumulate(ints.begin(), ints.end(), 0));
};

auto rng = "There are groups of [1, 2, 3, 4, 5] in the set." |
           bp::transform_replace('[' >> bp::int_ % ',' >> ']', bp::ws, string_sum);
int count = 0;
// Prints "There are groups of 15 in the set".
for (auto subrange : rng) {
    for (auto ch : subrange) {
        std::cout << ch;
    }
    ++count;
}
std::cout << "\n";
assert(count == 3);

Let the type decltype(f(std::move(attr))) be Replacement. Replacement must be a range, and must be compatible with r. See the description of boost::parser::replace_view's iterator compatibility requirements in the section above for details.

As with boost::parser::replace, boost::parser::transform_replace can be flattened from a view of subranges into a view of elements by piping it to std::views::join. See the section on boost::parser::replace above for an example.

Just like boost::parser::replace and boost::parser::replace_view, boost::parser::transform_replace and boost::parser::transform_replace_view do silent transcoding of the result to the appropriate UTF, if applicable. If both r and f(std::move(attr)) are ranges of char, or are both the same UTF, no transcoding occurs. If one of r and f(std::move(attr)) is a range of char and the other is some UTF, the program is ill-formed.

boost::parser::transform_replace_view will move each attribute into f; f may move from the argument or copy it as desired. f may return an lvalue reference. If it does so, the address of the reference will be taken and stored within boost::parser::transform_replace_view. Otherwise, the value returned by f is moved into boost::parser::transform_replace_view. In either case, the value type of boost::parser::transform_replace_view is always a subrange.

boost::parser::transform_replace can be called with, and boost::parser::transform_replace_view can be constructed with, a skip parser or not, and you can always pass boost::parser::trace at the end of any of their overloads.


PrevUpHomeNext