PrevUpHomeNext

Error Handling and Debugging

Error handling

Boost.Parser has good error reporting built into it. Consider what happens when we fail to parse at an expectation point (created using operator>). If I feed the parser from the Parsing JSON With Callbacks example a file called sample.json containing this input (note the unmatched '['):

{
    "key": "value",
    "foo": [, "bar": []
}

This is the error message that is printed to the terminal:

sample.json:3:12: error: Expected ']' here:
    "foo": [, "bar": []
            ^

That message is formatted like the diagnostics produced by Clang and GCC. It quotes the line on which the failure occurred, and even puts a caret under the exact position at which the parse failed. This error message is suitable for many kinds of end-users, and interoperates well with anything that supports Clang and/or GCC diagnostics.

Most of Boost.Parser's error handlers format their diagnostics this way, though you are not bound by that. You can make an error handler type that does whatever you want, as long as it meets the error handler interface.

The Boost.Parser error handlers are:

You can set the error handler to any of these, or one of your own, using with_error_handler() (see The parse() API). If you do not set one, default_error_handler will be used.

How diagnostics are generated

Boost.Parser only generates error messages like the ones in this page at failed expectation points, like a > b, where you have successfully parsed a, but then cannot successfully parse b. This may seem limited to you. It's actually the best that we can do.

In order for error handling to happen other than at expectation points, we have to know that there is no further processing that might take place. This is true because Boost.Parser has P1 | P2 | ... | Pn parsers ("or_parsers"). If any one of these parsers Pi fails to match, it is not allowed to fail the parse — the next one (Pi+1) might match. If we get to the end of the alternatives of the or_parser and Pn fails, we still cannot fail the top-level parse, because the or_parser might be a subparser within a parent or_parser.

Ok, so what might we do? Perhaps we could at least indicate when we ran into end-of-input. But we cannot, for exactly the same reason already stated. For any parser P, reaching end-of-input is a failure for P, but not necessarily for the whole parse.

Perhaps we could record the farthest point ever reached during the parse, and report that at the top level, if the top level parser fails. That would be little help without knowing which parser was active when we reached that point. This would require some sort of repeated memory allocation, since in Boost.Parser the progress point of the parser is stored exclusively on the stack — by the time we fail the top-level parse, all those far-reaching stack frames are long gone. Not the best.

Worse still, knowing how far you got in the parse and which parser was active is not very useful. Consider this.

namespace bp = boost::parser;
auto a_b = bp::char_('a') >> bp::char_('b');
auto c_b = bp::char_('c') >> bp::char_('b');
auto result = bp::parse("acb", a_b | c_b);

If we reported the farthest-reaching parser and it's position, it would be the a_b parser, at position "bc" in the input. Is this really enlightening? Was the error in the input putting the 'a' at the beginning or putting the 'c' in the middle? If you point the user at a_b as the parser that failed, and never mention c_b, you are potentially just steering them in the wrong direction.

All error messages must come from failed expectation points. Consider parsing JSON. If you open a list with '[', you know that you're parsing a list, and if the list is ill-formed, you'll get an error message saying so. If you open an object with '{', the same thing is possible — when missing the matching '}', you can tell the user, "That's not an object", and this is useful feedback. The same thing with a partially parsed number, etc. If the JSON parser does not build in expectations like matched braces and brackets, how can Boost.Parser know that a missing '}' is really a problem, and that no later parser will match the input even without the '}'?

[Important] Important

The bottom line is that you should build expectation points into your parsers using operator> as much as possible.

Using error handlers in semantic actions

You can get access to the error handler within any semantic action by calling _error_handler(ctx) (see The Parse Context). Any error handler must have the following member functions:

template<typename Context, typename Iter>
void diagnose(
    diagnostic_kind kind,
    std::string_view message,
    Context const & context,
    Iter it) const;

template<typename Context>
void diagnose(
    diagnostic_kind kind,
    std::string_view message,
    Context const & context) const;

If you call the second one, the one without the iterator parameter, it will call the first with _where(context).begin() as the iterator parameter. The one without the iterator is the one you will use most often. The one with the explicit iterator parameter can be useful in situations where you have messages that are related to each other, associated with multiple locations. For instance, if you are parsing XML, you may want to report that a close-tag does not match its associated open-tag by showing the line where the open-tag was found. That may of course not be located anywhere near _where(ctx).begin(). (A description of _globals() is below.)

[](auto & ctx) {
    // Assume we have a std::vector of open tags, and another
    // std::vector of iterators to where the open tags were parsed, in our
    // globals.
    if (_attr(ctx) != _globals(ctx).open_tags.back()) {
        std::string open_tag_msg =
            "Previous open-tag \"" + _globals(ctx).open_tags.back() + "\" here:";
        _error_handler(ctx).diagnose(
            boost::parser::diagnostic_kind::error,
            open_tag_msg,
            ctx,
            _globals(ctx).open_tags_position.back());
        std::string close_tag_msg =
            "does not match close-tag \"" + _attr(ctx) + "\" here:";
        _error_handler(ctx).diagnose(
            boost::parser::diagnostic_kind::error,
            close_tag_msg,
            ctx);

        // Explicitly fail the parse.  Diagnostics do not affect parse success.
        _pass(ctx) = false;
    }
}
_report_error() and _report_warning()

There are also some convenience functions that make the above code a little less verbose, _report_error() and _report_warning():

[](auto & ctx) {
    // Assume we have a std::vector of open tags, and another
    // std::vector of iterators to where the open tags were parsed, in our
    // globals.
    if (_attr(ctx) != _globals(ctx).open_tags.back()) {
        std::string open_tag_msg =
            "Previous open-tag \"" + _globals(ctx).open_tags.back() + "\" here:";
        _report_error(ctx, open_tag_msg, _globals(ctx).open_tag_positions.back());
        std::string close_tag_msg =
            "does not match close-tag \"" + _attr(ctx) + "\" here:";
        _report_error(ctx, close_tag_msg);

        // Explicitly fail the parse.  Diagnostics do not affect parse success.
        _pass(ctx) = false;
    }
}

You should use these less verbose functions almost all the time. The only time you would want to use _error_handler() directly is when you are using a custom error handler, and you want access to some part of its interface besides diagnose().

Though there is support for reporting warnings using the functions above, none of the error handlers supplied by Boost.Parser will ever report a warning. Warnings are strictly for user code.

For more information on the rest of the error handling and diagnostic API, see the header reference pages for error_handling_fwd.hpp and error_handling.hpp.

Creating your own error handler

Creating your own error handler is pretty easy; you just need to implement three member functions. Say you want an error handler that writes diagnostics to a file. Here's how you might do that.

struct logging_error_handler
{
    logging_error_handler() {}
    logging_error_handler(std::string_view filename) :
        filename_(filename), ofs_(filename_)
    {
        if (!ofs_)
            throw std::runtime_error("Could not open file.");
    }

    // This is the function called by Boost.Parser after a parser fails the
    // parse at an expectation point and throws a parse_error.  It is expected
    // to create a diagnostic message, and put it where it needs to go.  In
    // this case, we're writing it to a log file.  This function returns a
    // bp::error_handler_result, which is an enum with two enumerators -- fail
    // and rethrow.  Returning fail fails the top-level parse; returning
    // rethrow just re-throws the parse_error exception that got us here in
    // the first place.
    template<typename Iter, typename Sentinel>
    bp::error_handler_result
    operator()(Iter first, Sentinel last, bp::parse_error<Iter> const & e) const
    {
        bp::write_formatted_expectation_failure_error_message(
            ofs_, filename_, first, last, e);
        return bp::error_handler_result::fail;
    }

    // This function is for users to call within a semantic action to produce
    // a diagnostic.
    template<typename Context, typename Iter>
    void diagnose(
        bp::diagnostic_kind kind,
        std::string_view message,
        Context const & context,
        Iter it) const
    {
        bp::write_formatted_message(
            ofs_,
            filename_,
            bp::_begin(context),
            it,
            bp::_end(context),
            message);
    }

    // This is just like the other overload of diagnose(), except that it
    // determines the Iter parameter for the other overload by calling
    // _where(ctx).
    template<typename Context>
    void diagnose(
        bp::diagnostic_kind kind,
        std::string_view message,
        Context const & context) const
    {
        diagnose(kind, message, context, bp::_where(context).begin());
    }

    std::string filename_;
    mutable std::ofstream ofs_;
};

That's it. You just need to do the important work of the error handler in its call operator, and then implement the two overloads of diagnose() that it must provide for use inside semantic actions. The default implementation of these is even available as the free function write_formatted_message(), so you can just call that, as you see above. Here's how you might use it.

int main()
{
    std::cout << "Enter a list of integers, separated by commas. ";
    std::string input;
    std::getline(std::cin, input);

    constexpr auto parser = bp::int_ >> *(',' > bp::int_);
    logging_error_handler error_handler("parse.log");
    auto const result = bp::parse(input, bp::with_error_handler(parser, error_handler));

    if (result) {
        std::cout << "It looks like you entered:\n";
        for (int x : *result) {
            std::cout << x << "\n";
        }
    }
}

We just define a logging_error_handler, and pass it by reference to with_error_handler(), which decorates the top-level parser with the error handler. We could not have written bp::with_error_handler(parser, logging_error_handler("parse.log")), because with_error_handler() does not accept rvalues. This is becuse the error handler eventually goes into the parse context. The parse context only stores pointers and iterators, keeping it cheap to copy.

If we run the example and give it the input "1,", this shows up in the log file:

parse.log:1:2: error: Expected int_ here (end of input):
1,
  ^
Fixing ill-formed code

Sometimes, during the writing of a parser, you make a simple mistake that is diagnosed horrifyingly, due to the high number of template instantiations between the line you just wrote and the point of use (usually, the call to parse()). By "sometimes", I mean "almost always and many, many times". Boost.Parser has a workaround for situations like this. The workaround is to make the ill-formed code well-formed in as many circumstances as possible, and then do a runtime assert instead.

Usually, C++ programmers try whenever they can to catch mistakes as early as they can. That usually means making as much bad code ill-formed as possible. Counter-intuitively, this does not work well in parser combinator situations. For an example of just how dramatically different these two debugging scenarios can be with Boost.Parser, please see the very long discussion in the none is weird section of Rationale.

If you are morally opposed to this approach, or just hate fun, good news: you can turn off the use of this technique entirely by defining BOOST_PARSER_NO_RUNTIME_ASSERTIONS.

Runtime debugging

Debugging parsers is hard. Any parser above a certain complexity level is nearly impossible to debug simply by looking at the parser's code. Stepping through the parse in a debugger is even worse. To provide a reasonable chance of debugging your parsers, Boost.Parser has a trace mode that you can turn on simply by providing an extra parameter to parse() or callback_parse():

boost::parser::parse(input, parser, boost::parser::trace::on);

Every overload of parse() and callback_parse() takes this final parameter, which is defaulted to boost::parser::trace::off.

If we trace a substantial parser, we will see a lot of output. Each code point of the input must be considered, one at a time, to see if a certain rule matches. An an example, let's trace a parse using the JSON parser from Parsing JSON. The input is "null". null is one of the types that a Javascript value can have; the top-level parser in the JSON parser example is:

auto const value_p_def =
    number | bp::bool_ | null | string | array_p | object_p;

So, a JSON value can be a number, or a Boolean, a null, etc. During the parse, each alternative will be tried in turn, until one is matched. I picked null because it is relatively close to the beginning of the value_p_def alternative parser. Even so, the output is pretty huge. Let's break it down as we go:

[begin value; input="null"]

Each parser is traced as [begin foo; ...], then the parsing operations themselves, and then [end foo; ...]. The name of a rule is used as its name in the begin and end parts of the trace. Non-rules have a name that is similar to the way the parser looked when you wrote it. Most lines will have the next few code points of the input quoted, as we have here (input="null").

[begin number | bool_ | null | string | ...; input="null"]

This shows the beginning of the parser inside the rule value — the parser that actually does all the work. In the example code, this parser is called value_p_def. Since it isn't a rule, we have no name for it, so we show its implementation in terms of subparsers. Since it is a bit long, we don't print the entire thing. That's why that ellipsis is there.

[begin number; input="null"]
  [begin raw[lexeme[ >> ...]][<<action>>]; input="null"]

Now we're starting to see the real work being done. number is a somewhat complicated parser that does not match "null", so there's a lot to wade through when following the trace of its attempt to do so. One thing to note is that, since we cannot print a name for an action, we just print "<<action>>". Something similar happens when we come to an attribute that we cannot print, because it has no stream insertion operation. In that case, "<<unprintable-value>>" is printed.

    [begin raw[lexeme[ >> ...]]; input="null"]
      [begin lexeme[-char_('-') >> char_('1', '9') >> ... | ... >> ...]; input="null"]
        [begin -char_('-') >> char_('1', '9') >> *digit | char_('0') >> -(char_('.') >> ...) >> -( >> ...); input="null"]
          [begin -char_('-'); input="null"]
            [begin char_('-'); input="null"]
              no match
            [end char_('-'); input="null"]
            matched ""
            attribute: <<empty>>
          [end -char_('-'); input="null"]
          [begin char_('1', '9') >> *digit | char_('0'); input="null"]
            [begin char_('1', '9') >> *digit; input="null"]
              [begin char_('1', '9'); input="null"]
                no match
              [end char_('1', '9'); input="null"]
              no match
            [end char_('1', '9') >> *digit; input="null"]
            [begin char_('0'); input="null"]
              no match
            [end char_('0'); input="null"]
            no match
          [end char_('1', '9') >> *digit | char_('0'); input="null"]
          no match
        [end -char_('-') >> char_('1', '9') >> *digit | char_('0') >> -(char_('.') >> ...) >> -( >> ...); input="null"]
        no match
      [end lexeme[-char_('-') >> char_('1', '9') >> ... | ... >> ...]; input="null"]
      no match
    [end raw[lexeme[ >> ...]]; input="null"]
    no match
  [end raw[lexeme[ >> ...]][<<action>>]; input="null"]
  no match
[end number; input="null"]
[begin bool_; input="null"]
  no match
[end bool_; input="null"]

number and boost::parser::bool_ did not match, but null will:

[begin null; input="null"]
  [begin "null" >> attr(null); input="null"]
    [begin "null"; input="null"]
      [begin string("null"); input="null"]
        matched "null"
        attribute:
      [end string("null"); input=""]
      matched "null"
      attribute: null

Finally, this parser actually matched, and the match generated the attribute null, which is a special value of the type json::value. Since we were matching a string literal "null", earlier there was no attribute until we reached the attr(null) parser.

        [end "null"; input=""]
        [begin attr(null); input=""]
          matched ""
          attribute: null
        [end attr(null); input=""]
        matched "null"
        attribute: null
      [end "null" >> attr(null); input=""]
      matched "null"
      attribute: null
    [end null; input=""]
    matched "null"
    attribute: null
  [end number | bool_ | null | string | ...; input=""]
  matched "null"
  attribute: null
[end value; input=""]
--------------------
parse succeeded
--------------------

At the very end of the parse, the trace code prints out whether the top-level parse succeeded or failed.

Some things to be aware of when looking at Boost.Parser trace output:


PrevUpHomeNext