PrevUpHomeNext

Parsing Quoted Strings

It is very common to need to parse quoted strings. Quoted strings are slightly tricky, though, when using a skipper (and you should be using a skipper 99% of the time). You don't want to allow arbitrary whitespace in the middle of your strings, and you also don't want to remove all whitespace from your strings. Both of these things will happen with the typical skipper, ws.

So, here is how most people would write a quoted string parser:

namespace bp = boost::parser;
const auto string = bp::lexeme['"' >> *(bp::char_ - '"') > '"'];

Some things to note:

This is a very common pattern. I have written a quoted string parser like this dozens of times. The parser above is the quick-and-dirty version. A more robust version would be able to handle escaped quotes within the string, and then would immediately also need to support escaped escape characters.

Boost.Parser provides quoted_string to use in place of this very common pattern. It supports quote- and escaped-character-escaping, using backslash as the escape character.

namespace bp = boost::parser;

auto result1 = bp::parse("\"some text\"", bp::quoted_string, bp::ws);
assert(result1);
std::cout << *result1 << "\n"; // Prints: some text

auto result2 =
    bp::parse("\"some \\\"text\\\"\"", bp::quoted_string, bp::ws);
assert(result2);
std::cout << *result2 << "\n"; // Prints: some "text"

As common as this use case is, there are very similar use cases that it does not cover. So, quoted_string has some options. If you call it with a single character, it returns a quoted_string that uses that single character as the quote-character.

auto result3 = bp::parse("!some text!", bp::quoted_string('!'), bp::ws);
assert(result3);
std::cout << *result3 << "\n"; // Prints: some text

You can also supply a range of characters. One of the characters from the range must quote both ends of the string; mismatches are not allowed. Think of how Python allows you to quote a string with either '"' or '\'', but the same character must be used on both sides.

auto result4 = bp::parse("'some text'", bp::quoted_string("'\""), bp::ws);
assert(result4);
std::cout << *result4 << "\n"; // Prints: some text

Another common thing to do in a quoted string parser is to recognize escape sequences. If you have simple escape sequencecs that do not require any real parsing, like say the simple escape sequences from C++, you can provide a symbols object as well. The template parameter T to symbols<T> must be char or char32_t. You don't need to include the escaped backslash or the escaped quote character, since those always work.

// the c++ simple escapes
bp::symbols<char> const escapes = {
    {"'", '\''},
    {"?", '\?'},
    {"a", '\a'},
    {"b", '\b'},
    {"f", '\f'},
    {"n", '\n'},
    {"r", '\r'},
    {"t", '\t'},
    {"v", '\v'}};
auto result5 =
    bp::parse("\"some text\r\"", bp::quoted_string('"', escapes), bp::ws);
assert(result5);
std::cout << *result5 << "\n"; // Prints (with a CRLF newline): some text


PrevUpHomeNext