PrevUpHomeNext

trie, trie_map, and trie_set

A Trie is a is a prefix-tree that associates keys (which must be sequences) with values. Each node represents a prefix, and each child node represents a possible next element of the key. Here's a diagram of one:

This is a trie for the key-value pairs Elements = {("car", 15), ("cat", 9), ("cats", 11), ("ear", -3), ("eat", 100)}. Each node represents a prefix of one or more of the key values in Elements. The text in on the left in the node header is the prefix represented by that node. This is for illustration only; the real data structure does not store so much data in each node. The value on the right of each node header is the value associated with that prefix in Elements, if any. For instance, "car" is associated with 15 in Elements, and so the "car" node has a value of 15, but "ca" is not in Elements, so the "ca" node has no associated value.

A node's children are all of that node's suffixes within the keys in Elements. To find a value associated with some key k, you simply start at the root, look for the child node for element k[0], then from that node look for the child node that contains the element k[1], etc.

This data structure has the nice property that a search is linear in the length of the searched-for key k, but is independent of the size of the trie. This is similar to how hash lookups are (in an ideal hash table) independent of the size of the table.

Lookup performance is comparable to hashing containers, except that a trie additionally allows one to easily do longest-match and match-extension queries. This is something that is unavailable in either tree-based maps or hash maps.

For example:

// Starting with a trie, we can:
boost::text::trie<std::string, int> trie(
    {{"foo", 13}, {"bar", 17}, {"foos", 19}, {"", 42}});

// look up a specific element;
assert(trie["foo"] == 13);

// look for the node that contains as much of the given sequence as possible;
auto fo_subsequence = trie.longest_subsequence("fo");
assert(!fo_subsequence.match);

// try to extend the subsequence by one node;
auto foo_subsequence = trie.extend_subsequence(fo_subsequence, 'o');
assert(foo_subsequence.match);

// or try to extend the subsequence by multiple nodes;
auto const os_str = "os";
auto foos_subsequence =
    trie.extend_subsequence(fo_subsequence, os_str, os_str + 2);
assert(foos_subsequence.match);

trie::operator[] works differently from how it works in std::map-like containers. It works how I wish the std::map::operator[] did work:

boost::text::trie<std::string, int> trie(
    {{"foo", 13}, {"bar", 17}, {"foos", 19}, {"", 42}});

// This is the kind of thing you expect from a `std::map`.
assert(trie["foo"] == 13);
trie["foo"] = 111;
assert(trie["foo"] == 111);

// What operator[] actually returns though is an optional_ref.
boost::text::optional_ref<int> element = trie["foo"];
// You can ask the optional_ref if it refers to something or not, because it
// has an explicit conversion to bool.
if (element) {
    // You can read from and write to it as if it was its template parameter
    // type T, because it has an implicit conversion to that:
    int element_as_int = element;
    assert(element_as_int == 111);
    // Use element_as_int here....
}

boost::text::trie<std::string, int> const const_trie(
    {{"foo", 13}, {"bar", 17}, {"foos", 19}, {"", 42}});

// Because it returns this optional_ref, it's okay to use with constant tries.
// operator[] does not mutate the trie, even for mutable tries.
auto element_2 = const_trie["foo"]; // Whaaaaat?
if (element_2) {
    int element_as_int = element_2;
    assert(element_as_int == 13);
    // Use element_as_int here....
}

boost::text::trie<std::string, bool> const bool_trie(
    {{"foo", true}, {"bar", true}, {"foos", false}, {"", true}});

// What about optional_ref<bool> you ask?  There are specializations for it
// that have no implcit conversions.  You just have to use the familiar
// operator* to get its bool value as you would with boost::optional or
// std::optional.
auto element_3 = bool_trie["foos"];
if (element_3) {
    // bool element_as_bool = element_3; // Error! No implicit conversion to bool.
    bool element_as_bool = *element_3;
    assert(element_as_bool == false);
    // Use element_as_bool here....
}

The only wonky thing about optional_ref is the specialization for bool. Think about how often a map from T to bool comes up. It's almost never, because that would more naturally be expressed as a set of T anyway.

[Note] Note

trie is not a container. It has no iterators. If you need a version of trie that is iterable, use trie_map or trie_set instead. Note that there is a modest performance overhead associated with the iterator implementations, which is why trie does not have them.


PrevUpHomeNext