Boost.Parser has good error reporting built into it. Consider what happens
when we fail to parse at an expectation point (created using operator>
).
If I feed the parser from the Parsing
JSON With Callbacks example a file called sample.json containing this
input (note the unmatched '['
):
{ "key": "value", "foo": [, "bar": [] }
This is the error message that is printed to the terminal:
sample.json:3:12: error: Expected ']' here: "foo": [, "bar": [] ^
That message is formatted like the diagnostics produced by Clang and GCC. It quotes the line on which the failure occurred, and even puts a caret under the exact position at which the parse failed. This error message is suitable for many kinds of end-users, and interoperates well with anything that supports Clang and/or GCC diagnostics.
Most of Boost.Parser's error handlers format their diagnostics this way, though you are not bound by that. You can make an error handler type that does whatever you want, as long as it meets the error handler interface.
The Boost.Parser error handlers are:
default_error_handler
:
Produces formatted diagnostics like the one above, and prints them to
std::cerr
. default_error_handler
has
no associated file name, and both errors and diagnostics are printed
to std::cerr
. This handler is constexpr
-friendly.
stream_error_handler
:
Produces formatted diagnostics. One or two streams may be used. If two
are used, errors go to one stream and warnings go to the other. A file
name can be associated with the parse; if it is, that file name will
appear in all diagnostics.
callback_error_handler
:
Produces formatted diagnostics. Calls a callback with the diagnostic
message to report the diagnostic, rather than streaming out the diagnostic.
A file name can be associated with the parse; if it is, that file name
will appear in all diagnostics. This handler is useful for recording
the diagnostics in memory.
rethrow_error_handler
:
Does nothing but re-throw any exception that it is asked to handle. Its
diagnose()
member functions are no-ops.
vs_output_error_handler
:
Directs all errors and warnings to the debugging output panel inside
Visual Studio. Available on Windows only. Probably does nothing useful
desirable when executed outside of Visual Studio.
You can set the error handler to any of these, or one of your own, using
with_error_handler()
(see The
parse()
API). If you do not set one, default_error_handler
will
be used.
Boost.Parser only generates error messages like the ones in this page at
failed expectation points, like a > b
, where you have successfully
parsed a
, but then cannot successfully parse b
.
This may seem limited to you. It's actually the best that we can do.
In order for error handling to happen other than at expectation points, we
have to know that there is no further processing that might take place. This
is true because Boost.Parser has P1 | P2 | ... | Pn
parsers
("or_parser
s"). If any one of these parsers Pi
fails to match, it is not allowed to fail the parse — the next one
(Pi+1
) might match. If we get to the end of the alternatives
of the or_parser and Pn
fails, we still cannot fail the top-level
parse, because the or_parser
might be a subparser within a parent
or_parser
.
Ok, so what might we do? Perhaps we could at least indicate when we ran into
end-of-input. But we cannot, for exactly the same reason already stated.
For any parser P
, reaching end-of-input is a failure for P
,
but not necessarily for the whole parse.
Perhaps we could record the farthest point ever reached during the parse, and report that at the top level, if the top level parser fails. That would be little help without knowing which parser was active when we reached that point. This would require some sort of repeated memory allocation, since in Boost.Parser the progress point of the parser is stored exclusively on the stack — by the time we fail the top-level parse, all those far-reaching stack frames are long gone. Not the best.
Worse still, knowing how far you got in the parse and which parser was active is not very useful. Consider this.
namespace bp = boost::parser; auto a_b = bp::char_('a') >> bp::char_('b'); auto c_b = bp::char_('c') >> bp::char_('b'); auto result = bp::parse("acb", a_b | c_b);
If we reported the farthest-reaching parser and it's position, it would be
the a_b
parser, at position "bc"
in the
input. Is this really enlightening? Was the error in the input putting the
'a'
at the beginning or putting the 'c'
in the
middle? If you point the user at a_b
as the parser that failed,
and never mention c_b
, you are potentially just steering them
in the wrong direction.
All error messages must come from failed expectation points. Consider parsing
JSON. If you open a list with '['
, you know that you're parsing
a list, and if the list is ill-formed, you'll get an error message saying
so. If you open an object with '{'
, the same thing is possible
— when missing the matching '}'
, you can tell the user,
"That's not an object", and this is useful feedback. The same thing
with a partially parsed number, etc. If the JSON parser does not build in
expectations like matched braces and brackets, how can Boost.Parser know
that a missing '}'
is really a problem, and that no later parser
will match the input even without the '}'
?
Important | |
---|---|
The bottom line is that you should build expectation points into your parsers
using |
You can get access to the error handler within any semantic action by calling
_error_handler(ctx)
(see The
Parse Context). Any error handler must have the following member functions:
template<typename Context, typename Iter> void diagnose( diagnostic_kind kind, std::string_view message, Context const & context, Iter it) const;
template<typename Context> void diagnose( diagnostic_kind kind, std::string_view message, Context const & context) const;
If you call the second one, the one without the iterator parameter, it will
call the first with _where(context).begin()
as the iterator parameter. The one without the iterator is the one you will
use most often. The one with the explicit iterator parameter can be useful
in situations where you have messages that are related to each other, associated
with multiple locations. For instance, if you are parsing XML, you may want
to report that a close-tag does not match its associated open-tag by showing
the line where the open-tag was found. That may of course not be located
anywhere near _where(ctx).begin()
. (A description of _globals()
is below.)
[](auto & ctx) { // Assume we have a std::vector of open tags, and another // std::vector of iterators to where the open tags were parsed, in our // globals. if (_attr(ctx) != _globals(ctx).open_tags.back()) { std::string open_tag_msg = "Previous open-tag \"" + _globals(ctx).open_tags.back() + "\" here:"; _error_handler(ctx).diagnose( boost::parser::diagnostic_kind::error, open_tag_msg, ctx, _globals(ctx).open_tags_position.back()); std::string close_tag_msg = "does not match close-tag \"" + _attr(ctx) + "\" here:"; _error_handler(ctx).diagnose( boost::parser::diagnostic_kind::error, close_tag_msg, ctx); // Explicitly fail the parse. Diagnostics do not affect parse success. _pass(ctx) = false; } }
There are also some convenience functions that make the above code a little
less verbose, _report_error()
and _report_warning()
:
[](auto & ctx) { // Assume we have a std::vector of open tags, and another // std::vector of iterators to where the open tags were parsed, in our // globals. if (_attr(ctx) != _globals(ctx).open_tags.back()) { std::string open_tag_msg = "Previous open-tag \"" + _globals(ctx).open_tags.back() + "\" here:"; _report_error(ctx, open_tag_msg, _globals(ctx).open_tag_positions.back()); std::string close_tag_msg = "does not match close-tag \"" + _attr(ctx) + "\" here:"; _report_error(ctx, close_tag_msg); // Explicitly fail the parse. Diagnostics do not affect parse success. _pass(ctx) = false; } }
You should use these less verbose functions almost all the time. The only
time you would want to use _error_handler()
directly is when you are using a custom error handler, and you want access
to some part of its interface besides diagnose()
.
Though there is support for reporting warnings using the functions above, none of the error handlers supplied by Boost.Parser will ever report a warning. Warnings are strictly for user code.
For more information on the rest of the error handling and diagnostic API,
see the header reference pages for error_handling_fwd.hpp
and error_handling.hpp
.
Creating your own error handler is pretty easy; you just need to implement three member functions. Say you want an error handler that writes diagnostics to a file. Here's how you might do that.
struct logging_error_handler { logging_error_handler() {} logging_error_handler(std::string_view filename) : filename_(filename), ofs_(filename_) { if (!ofs_) throw std::runtime_error("Could not open file."); } // This is the function called by Boost.Parser after a parser fails the // parse at an expectation point and throws a parse_error. It is expected // to create a diagnostic message, and put it where it needs to go. In // this case, we're writing it to a log file. This function returns a // bp::error_handler_result, which is an enum with two enumerators -- fail // and rethrow. Returning fail fails the top-level parse; returning // rethrow just re-throws the parse_error exception that got us here in // the first place. template<typename Iter, typename Sentinel> bp::error_handler_result operator()(Iter first, Sentinel last, bp::parse_error<Iter> const & e) const { bp::write_formatted_expectation_failure_error_message( ofs_, filename_, first, last, e); return bp::error_handler_result::fail; } // This function is for users to call within a semantic action to produce // a diagnostic. template<typename Context, typename Iter> void diagnose( bp::diagnostic_kind kind, std::string_view message, Context const & context, Iter it) const { bp::write_formatted_message( ofs_, filename_, bp::_begin(context), it, bp::_end(context), message); } // This is just like the other overload of diagnose(), except that it // determines the Iter parameter for the other overload by calling // _where(ctx). template<typename Context> void diagnose( bp::diagnostic_kind kind, std::string_view message, Context const & context) const { diagnose(kind, message, context, bp::_where(context).begin()); } std::string filename_; mutable std::ofstream ofs_; };
That's it. You just need to do the important work of the error handler in
its call operator, and then implement the two overloads of diagnose()
that it must provide for use inside semantic actions. The default implementation
of these is even available as the free function write_formatted_message()
,
so you can just call that, as you see above. Here's how you might use it.
int main() { std::cout << "Enter a list of integers, separated by commas. "; std::string input; std::getline(std::cin, input); constexpr auto parser = bp::int_ >> *(',' > bp::int_); logging_error_handler error_handler("parse.log"); auto const result = bp::parse(input, bp::with_error_handler(parser, error_handler)); if (result) { std::cout << "It looks like you entered:\n"; for (int x : *result) { std::cout << x << "\n"; } } }
We just define a logging_error_handler
, and pass it by reference
to with_error_handler()
, which decorates the top-level
parser with the error handler. We could not
have written bp::with_error_handler(parser, logging_error_handler("parse.log"))
,
because with_error_handler()
does not accept rvalues. This is becuse the error handler eventually goes
into the parse context. The parse context only stores pointers and iterators,
keeping it cheap to copy.
If we run the example and give it the input "1,"
,
this shows up in the log file:
parse.log:1:2: error: Expected int_ here (end of input): 1, ^
Sometimes, during the writing of a parser, you make a simple mistake that
is diagnosed horrifyingly, due to the high number of template instantiations
between the line you just wrote and the point of use (usually, the call to
parse()
). By "sometimes",
I mean "almost always and many, many times". Boost.Parser has a
workaround for situations like this. The workaround is to make the ill-formed
code well-formed in as many circumstances as possible, and then do a runtime
assert instead.
Usually, C++ programmers try whenever they can to catch mistakes as early
as they can. That usually means making as much bad code ill-formed as possible.
Counter-intuitively, this does not work well in parser combinator situations.
For an example of just how dramatically different these two debugging scenarios
can be with Boost.Parser, please see the very long discussion in the none
is weird section of Rationale.
If you are morally opposed to this approach, or just hate fun, good news:
you can turn off the use of this technique entirely by defining BOOST_PARSER_NO_RUNTIME_ASSERTIONS
.
Debugging parsers is hard. Any parser above a certain complexity level is
nearly impossible to debug simply by looking at the parser's code. Stepping
through the parse in a debugger is even worse. To provide a reasonable chance
of debugging your parsers, Boost.Parser has a trace mode that you can turn
on simply by providing an extra parameter to parse()
or callback_parse()
:
boost::parser::parse(input, parser, boost::parser::trace::on);
Every overload of parse()
and callback_parse()
takes this final parameter,
which is defaulted to boost::parser::trace::off
.
If we trace a substantial parser, we will see a lot
of output. Each code point of the input must be considered, one at a time,
to see if a certain rule matches. An an example, let's trace a parse using
the JSON parser from Parsing
JSON. The input is "null"
. null
is one of the types that a Javascript value can have; the top-level parser
in the JSON parser example is:
auto const value_p_def = number | bp::bool_ | null | string | array_p | object_p;
So, a JSON value can be a number, or a Boolean, a null
, etc.
During the parse, each alternative will be tried in turn, until one is matched.
I picked null
because it is relatively close to the beginning
of the value_p_def
alternative parser. Even so, the output is
pretty huge. Let's break it down as we go:
[begin value; input="null"]
Each parser is traced as [begin foo; ...]
, then the parsing
operations themselves, and then [end foo; ...]
. The name of
a rule is used as its name in the begin
and end
parts of the trace. Non-rules have a name that is similar to the way the
parser looked when you wrote it. Most lines will have the next few code points
of the input quoted, as we have here (input="null"
).
[begin number | bool_ | null | string | ...; input="null"]
This shows the beginning of the parser inside
the rule value
— the parser that actually does all the
work. In the example code, this parser is called value_p_def
.
Since it isn't a rule, we have no name for it, so we show its implementation
in terms of subparsers. Since it is a bit long, we don't print the entire
thing. That's why that ellipsis is there.
[begin number; input="null"] [begin raw[lexeme[ >> ...]][<<action>>]; input="null"]
Now we're starting to see the real work being done. number
is
a somewhat complicated parser that does not match "null"
,
so there's a lot to wade through when following the trace of its attempt
to do so. One thing to note is that, since we cannot print a name for an
action, we just print "<<action>>"
. Something
similar happens when we come to an attribute that we cannot print, because
it has no stream insertion operation. In that case, "<<unprintable-value>>"
is printed.
[begin raw[lexeme[ >> ...]]; input="null"] [begin lexeme[-char_('-') >> char_('1', '9') >> ... | ... >> ...]; input="null"] [begin -char_('-') >> char_('1', '9') >> *digit | char_('0') >> -(char_('.') >> ...) >> -( >> ...); input="null"] [begin -char_('-'); input="null"] [begin char_('-'); input="null"] no match [end char_('-'); input="null"] matched "" attribute: <<empty>> [end -char_('-'); input="null"] [begin char_('1', '9') >> *digit | char_('0'); input="null"] [begin char_('1', '9') >> *digit; input="null"] [begin char_('1', '9'); input="null"] no match [end char_('1', '9'); input="null"] no match [end char_('1', '9') >> *digit; input="null"] [begin char_('0'); input="null"] no match [end char_('0'); input="null"] no match [end char_('1', '9') >> *digit | char_('0'); input="null"] no match [end -char_('-') >> char_('1', '9') >> *digit | char_('0') >> -(char_('.') >> ...) >> -( >> ...); input="null"] no match [end lexeme[-char_('-') >> char_('1', '9') >> ... | ... >> ...]; input="null"] no match [end raw[lexeme[ >> ...]]; input="null"] no match [end raw[lexeme[ >> ...]][<<action>>]; input="null"] no match [end number; input="null"] [begin bool_; input="null"] no match [end bool_; input="null"]
number
and boost::parser::bool_
did not match,
but null
will:
[begin null; input="null"] [begin "null" >> attr(null); input="null"] [begin "null"; input="null"] [begin string("null"); input="null"] matched "null" attribute: [end string("null"); input=""] matched "null" attribute: null
Finally, this parser actually matched, and the match generated the attribute
null
, which is a special value of the type json::value
.
Since we were matching a string literal "null"
, earlier
there was no attribute until we reached the attr(null)
parser.
[end "null"; input=""] [begin attr(null); input=""] matched "" attribute: null [end attr(null); input=""] matched "null" attribute: null [end "null" >> attr(null); input=""] matched "null" attribute: null [end null; input=""] matched "null" attribute: null [end number | bool_ | null | string | ...; input=""] matched "null" attribute: null [end value; input=""] -------------------- parse succeeded --------------------
At the very end of the parse, the trace code prints out whether the top-level parse succeeded or failed.
Some things to be aware of when looking at Boost.Parser trace output:
p[a]
forms an action_parser
containing the parser p
and semantic action a
.
This is essentially an implementation detail, but unfortunately the trace
output does not hide this from you.
p
, the trace-name may be intentionally different
from the actual structure of p
. For example, in the trace
above, you see a parser called simply "null"
.
This parser is actually boost::parser::omit[boost::parser::string("null")]
,
but what you typically write is just "null"
, so
that's the name used. There are two special cases like this: the one
described here for omit[string]
, and another for omit[char_]
.
if_(pred)[p]
is described as "Equivalent
to eps(pred)
>> p
". In a trace, you will not see if_
;
you will see eps
and p
instead.