Parsing into structs and classes

Parsing into `struct`s and `class`es

So far, we've seen only simple parsers that parse the same value repeatedly (with or without commas and spaces). It's also very common to parse a few values in a specific sequence. Let's say you want to parse an employee record. Here's a parser you might write:

namespace bp = boost::parser;
auto employee_parser = bp::lit("employee")
    >> '{'
    >> bp::int_ >> ','
    >> quoted_string >> ','
    >> quoted_string >> ','
    >> bp::double_
    >> '}';

The attribute type for employee_parser is boost::parser::tuple<int, std::string, std::string, double>. That's great, in that you got all the parsed data for the record without having to write any semantic actions. It's not so great that you now have to get all the individual elements out by their indices, using get(). It would be much nicer to parse into the final data structure that your program is going to use. This is often some struct or class. Boost.Parser supports parsing into arbitrary aggregate structs, and non-aggregates that are constructible from the tuple at hand.

Aggregate types as attributes

If we have a struct that has data members of the same types listed in the boost::parser::tuple attribute type for employee_parser, it would be nice to parse directly into it, instead of parsing into a tuple and then constructing our struct later. Fortunately, this just works in Boost.Parser. Here is an example of parsing straight into a compatible aggregate type.

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


struct employee
{
    int age;
    std::string surname;
    std::string forename;
    double salary;
};

namespace bp = boost::parser;

int main()
{
    std::cout << "Enter employee record. ";
    std::string input;
    std::getline(std::cin, input);

    auto quoted_string = bp::lexeme['"' >> +(bp::char_ - '"') >> '"'];
    auto employee_p = bp::lit("employee")
        >> '{'
        >> bp::int_ >> ','
        >> quoted_string >> ','
        >> quoted_string >> ','
        >> bp::double_
        >> '}';

    employee record;
    auto const result = bp::parse(input, employee_p, bp::ws, record);

    if (result) {
        std::cout << "You entered:\nage:      " << record.age
                  << "\nsurname:  " << record.surname
                  << "\nforename: " << record.forename
                  << "\nsalary  : " << record.salary << "\n";
    } else {
        std::cout << "Parse failure.\n";
    }
}

Unfortunately, this is taking advantage of the loose attribute assignment logic; the employee_parser parser still has a boost::parser::tuple attribute. See The parse() API for a description of attribute out-param compatibility.

For this reason, it's even more common to want to make a rule that returns a specific type like employee. Just by giving the rule a struct type, we make sure that this parser always generates an employee struct as its attribute, no matter where it is in the parse. If we made a simple parser P that uses the employee_p rule, like bp::int >> employee_p, P's attribute type would be boost::parser::tuple<int, employee>.

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


struct employee
{
    int age;
    std::string surname;
    std::string forename;
    double salary;
};

namespace bp = boost::parser;

bp::rule<struct quoted_string, std::string> quoted_string = "quoted name";
bp::rule<struct employee_p, employee> employee_p = "employee";

auto quoted_string_def = bp::lexeme['"' >> +(bp::char_ - '"') >> '"'];
auto employee_p_def = bp::lit("employee")
    >> '{'
    >> bp::int_ >> ','
    >> quoted_string >> ','
    >> quoted_string >> ','
    >> bp::double_
    >> '}';

BOOST_PARSER_DEFINE_RULES(quoted_string, employee_p);

int main()
{
    std::cout << "Enter employee record. ";
    std::string input;
    std::getline(std::cin, input);

    static_assert(std::is_aggregate_v<std::decay_t<employee &>>);

    auto const result = bp::parse(input, employee_p, bp::ws);

    if (result) {
        std::cout << "You entered:\nage:      " << result->age
                  << "\nsurname:  " << result->surname
                  << "\nforename: " << result->forename
                  << "\nsalary  : " << result->salary << "\n";
    } else {
        std::cout << "Parse failure.\n";
    }
}

Just as you can pass a struct as an out-param to parse() when the parser's attribute type is a tuple, you can also pass a tuple as an out-param to parse() when the parser's attribute type is a struct:

// Using the employee_p rule from above, with attribute type employee...
boost::parser::tuple<int, std::string, std::string, double> tup;
auto const result = bp::parse(input, employee_p, bp::ws, tup); // Ok!

	Important
	This automatic use of `struct`s as if they were tuples depends on a bit of metaprogramming. Due to compiler limits, the metaprogram that detects the number of data members of a `struct` is limited to a maximum number of members. Fortunately, that limit is configurable; see `BOOST_PARSER_MAX_AGGREGATE_SIZE`.

General `class` types as attributes

Many times you don't have an aggregate struct that you want to produce from your parse. It would be even nicer than the aggregate code above if Boost.Parser could detect that the members of a tuple that is produced as an attribute are usable as the arguments to some type's constructor. So, Boost.Parser does that.

#include <boost/parser/parser.hpp>

#include <iostream>
#include <string>


namespace bp = boost::parser;

int main()
{
    std::cout << "Enter a string followed by two unsigned integers. ";
    std::string input;
    std::getline(std::cin, input);

    constexpr auto string_uint_uint =
        bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_;
    std::string string_from_parse;
    if (parse(input, string_uint_uint, bp::ws, string_from_parse))
        std::cout << "That yields this string: " << string_from_parse << "\n";
    else
        std::cout << "Parse failure.\n";

    std::cout << "Enter an unsigned integer followed by a string. ";
    std::getline(std::cin, input);
    std::cout << input << "\n";

    constexpr auto uint_string = bp::uint_ >> bp::char_ >> bp::char_;
    std::vector<std::string> vector_from_parse;
    if (parse(input, uint_string, bp::ws, vector_from_parse)) {
        std::cout << "That yields this vector of strings:\n";
        for (auto && str : vector_from_parse) {
            std::cout << "  '" << str << "'\n";
        }
    } else {
        std::cout << "Parse failure.\n";
    }
}

Let's look at the first parse.

constexpr auto string_uint_uint =
    bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_;
std::string string_from_parse;
if (parse(input, string_uint_uint, bp::ws, string_from_parse))
    std::cout << "That yields this string: " << string_from_parse << "\n";
else
    std::cout << "Parse failure.\n";

Here, we use the parser string_uint_uint, which produces a boost::parser::tuple<std::string, unsigned int, unsigned int> attribute. When we try to parse that into an out-param std::string attribute, it just works. This is because std::string has a constructor that takes a std::string, an offset, and a length. Here's the other parse:

constexpr auto uint_string = bp::uint_ >> bp::char_ >> bp::char_;
std::vector<std::string> vector_from_parse;
if (parse(input, uint_string, bp::ws, vector_from_parse)) {
    std::cout << "That yields this vector of strings:\n";
    for (auto && str : vector_from_parse) {
        std::cout << "  '" << str << "'\n";
    }
} else {
    std::cout << "Parse failure.\n";
}

Now we have the parser uint_string, which produces boost::parser::tuple<unsigned int, std::string> attribute — the two chars at the end combine into a std::string. Those two values can be used to construct a std::vector<std::string>, via the count, T constructor.

Just like with using aggregates in place of tuples, non-aggregate class types can be substituted for tuples in most places. That includes using a non-aggregate class type as the attribute type of a rule.

However, while compatible tuples can be substituted for aggregates, you can't substitute a tuple for some class type T just because the tuple could have been used to construct T. Think of trying to invert the substitution in the second parse above. Converting a std::vector<std::string> into a boost::parser::tuple<unsigned int, std::string> makes no sense.