std::string
has a bunch of member functions
that are essentially algorithms hanging off of a string type for no reason.
Instead of repeating this with the types in the text layer, Boost.Text instead
provides these algorithms as free functions, like all the other C++ algorithms.
Unlike all the other C++ algorithms, the Boost.Text string algorithms work
with code_point_range
s,
grapheme_range
s,
and even pointers to null-terminated strings, in all the permutations you
might need. You can use ranges, just like with the algorithms in std::ranges
:
assert(boost::text::starts_with(U"here", U"her")); // Like std::ranges::starts_with().
You can use any range that models utf_range_like
or grapheme_range
. You can also compare
a range to a C-style null-terminated string pointer, even if the C-style
string is in a different UTF format:
char const * her = "her"; assert(boost::text::starts_with(U"here", her));
And of course, the text-layer types are supported:
boost::text::rope here = "here"; boost::text::text_view her = "her"; assert(boost::text::starts_with(here, her));
Tip | |
---|---|
If you like using the Boost.StringAlgo library, the text layer types work with those as well. Just be sure to use the code point iterators of your favorite text-layer type; Boost.StringAlgo doesn't know anything about graphemes. |
See string_algorithm.hpp for a list of all the algorithms.
There are a lot of small utility functions scattered throughout the Boost.Text
headers. My philosophy is that if I have to write a function f()
to
implement a major feature, and f()
seems like it might be useful to someone,
somewhere, ever, I'm going to keep f()
publicly available, rather than putting
in a detail
namespace. I'm
not going to list all the available functions, but here are some pointers
to where you can find them:
- transcode_iterator.hpp
contains a variety of utility functions that give info on code points and
UTF-8 code units. For instance, it contains replacement_character()
, which just gives you the value of the
Unicode replacement character, and scalar_value()
, which tells you if a given unsigned int
is in the range of Unicode scalar values.
- There are a couple of overloads of to_string()
that can be used to construct a std::string
from a range of code points.
- General-purpose algorithms can be found in algorithm.hpp.
These are essentially back-ports of algorithms that are available anywhere
from C++26 on back to C++98. They differ from the ones in std
in that they are sentinel-friendly, and some have different names, because
they predate the ones in the standard.
- There are numerous foo_prop()
functions, where foo
is word
, line
,
etc. These give the text segmentation- or algorithm-specific property for
any code point. See word_prop()
in word_break.hpp
for an example.