It is very common to need to parse quoted strings. Quoted strings are slightly
tricky, though, when using a skipper (and you should be using a skipper 99%
of the time). You don't want to allow arbitrary whitespace in the middle
of your strings, and you also don't want to remove all whitespace from your
strings. Both of these things will happen with the typical skipper, ws
.
So, here is how most people would write a quoted string parser:
namespace bp = boost::parser; const auto string = bp::lexeme['"' >> *(bp::char_ - '"') > '"'];
Some things to note:
lexeme[]
disables skipping in the
parser, and it must be written around the quotes, not around the operator*
expression; and
This is a very common pattern. I have written a quoted string parser like this dozens of times. The parser above is the quick-and-dirty version. A more robust version would be able to handle escaped quotes within the string, and then would immediately also need to support escaped escape characters.
Boost.Parser provides quoted_string
to use in place
of this very common pattern. It supports quote- and escaped-character-escaping,
using backslash as the escape character.
namespace bp = boost::parser; auto result1 = bp::parse("\"some text\"", bp::quoted_string, bp::ws); assert(result1); std::cout << *result1 << "\n"; // Prints: some text auto result2 = bp::parse("\"some \\\"text\\\"\"", bp::quoted_string, bp::ws); assert(result2); std::cout << *result2 << "\n"; // Prints: some "text"
As common as this use case is, there are very similar use cases that it does
not cover. So, quoted_string
has some options.
If you call it with a single character, it returns a quoted_string
that uses that
single character as the quote-character.
auto result3 = bp::parse("!some text!", bp::quoted_string('!'), bp::ws); assert(result3); std::cout << *result3 << "\n"; // Prints: some text
You can also supply a range of characters. One of the characters from the
range must quote both ends of the string; mismatches are not allowed. Think
of how Python allows you to quote a string with either '"'
or '\''
, but the same character
must be used on both sides.
auto result4 = bp::parse("'some text'", bp::quoted_string("'\""), bp::ws); assert(result4); std::cout << *result4 << "\n"; // Prints: some text
Another common thing to do in a quoted string parser is to recognize escape
sequences. If you have simple escape sequencecs that do not require any real
parsing, like say the simple escape sequences from C++, you can provide a
symbols
object as well. The template parameter T
to symbols<T>
must be char
or char32_t
. You don't need to include the escaped
backslash or the escaped quote character, since those always work.
// the c++ simple escapes bp::symbols<char> const escapes = { {"'", '\''}, {"?", '\?'}, {"a", '\a'}, {"b", '\b'}, {"f", '\f'}, {"n", '\n'}, {"r", '\r'}, {"t", '\t'}, {"v", '\v'}}; auto result5 = bp::parse("\"some text\r\"", bp::quoted_string('"', escapes), bp::ws); assert(result5); std::cout << *result5 << "\n"; // Prints (with a CRLF newline): some text