Boost.Text is composed of two main layers:
text
layer
There are a couple of assorted bits that were necessary or useful to have around
when implementing various parts of Boost.Text: segmented_vector
, unencoded_rope
,
unencoded_rope_view
,
and trie
/trie_map
/trie_set
.
The Unicode layer provides a few Unicode-related utility types, but is primarily comprised of the Unicode algorithms. These algorithms are done in the style of the standard algorithms, with range-friendly interfaces. For each of the unicode algorithms there is a corresponding view. There are algorithms for these Unicode operations:
to_upper()
, is_lower()
, etc.)
These algorithms are independent of the text
layer; it is possible to
use Boost.Text as a Unicode library without using the text
layer at all.
The text
layer is built on top of the Unicode layer. Its types encode text as UTF-8,
and maintain normalization. Much of their implementation is done in terms of
the algorithms from the Unicode layer. The types in this layer are: text
,
text_view
,
rope
,
and rope_view
.
It contains templates that can be instantiated with different UTF formats,
normalization forms, and/or underlying storage.
Finally, there are some items that I wrote in the process of implementing everything else, that rise to the level of general utility.
First is segmented_vector
.
This is a discontiguous sequence of T
,
for which insertions anywhere in the sequence are cheap, with very cheap copies
provided via a copy-on-write mechanism. It is a generalization of unencoded_rope
for arbitrary T
.
The remaining assorted types are trie
, trie_map
, and trie_set
. The first of these
is a trie that is not a valid C++ container. The latter two are analogous to
std::map
and std::set
, respectively,
just built on a trie instead of a binary tree.