A Trie is a is a prefix-tree that associates keys (which must be sequences) with values. Each node represents a prefix, and each child node represents a possible next element of the key. Here's a diagram of one:
This is a trie for the key-value pairs Elements
= {("car", 15), ("cat", 9), ("cats", 11), ("ear", -3),
("eat", 100)}
.
Each node represents a prefix of one or more of the key values in Elements
. The text in on the left in the
node header is the prefix represented by that node. This is for illustration
only; the real data structure does not store so much data in each node. The
value on the right of each node header is the value associated with that prefix
in Elements
, if any. For instance,
"car"
is associated with
15
in Elements
,
and so the "car"
node
has a value of 15
, but "ca"
is not in Elements
,
so the "ca"
node has
no associated value.
A node's children are all of that node's suffixes within the keys in Elements
. To find a value associated with
some key k
, you simply start
at the root, look for the child node for element k[0]
,
then from that node look for the child node that contains the element k[1]
, etc.
This data structure has the nice property that a search is linear in the length
of the searched-for key k
,
but is independent of the size of the trie. This is similar to how hash lookups
are (in an ideal hash table) independent of the size of the table.
Lookup performance is comparable to hashing containers, except that a trie additionally allows one to easily do longest-match and match-extension queries. This is something that is unavailable in either tree-based maps or hash maps.
For example:
// Starting with a trie, we can: boost::text::trie<std::string, int> trie( {{"foo", 13}, {"bar", 17}, {"foos", 19}, {"", 42}}); // look up a specific element; assert(trie["foo"] == 13); // look for the node that contains as much of the given sequence as possible; auto fo_subsequence = trie.longest_subsequence("fo"); assert(!fo_subsequence.match); // try to extend the subsequence by one node; auto foo_subsequence = trie.extend_subsequence(fo_subsequence, 'o'); assert(foo_subsequence.match); // or try to extend the subsequence by multiple nodes; auto const os_str = "os"; auto foos_subsequence = trie.extend_subsequence(fo_subsequence, os_str, os_str + 2); assert(foos_subsequence.match);
trie::operator[]
works
differently from how it works in std::map
-like
containers. It works how I wish the std::map::operator[]
did work:
boost::text::trie<std::string, int> trie( {{"foo", 13}, {"bar", 17}, {"foos", 19}, {"", 42}}); // This is the kind of thing you expect from a `std::map`. assert(trie["foo"] == 13); trie["foo"] = 111; assert(trie["foo"] == 111); // What operator[] actually returns though is an optional_ref. boost::text::optional_ref<int> element = trie["foo"]; // You can ask the optional_ref if it refers to something or not, because it // has an explicit conversion to bool. if (element) { // You can read from and write to it as if it was its template parameter // type T, because it has an implicit conversion to that: int element_as_int = element; assert(element_as_int == 111); // Use element_as_int here.... } boost::text::trie<std::string, int> const const_trie( {{"foo", 13}, {"bar", 17}, {"foos", 19}, {"", 42}}); // Because it returns this optional_ref, it's okay to use with constant tries. // operator[] does not mutate the trie, even for mutable tries. auto element_2 = const_trie["foo"]; // Whaaaaat? if (element_2) { int element_as_int = element_2; assert(element_as_int == 13); // Use element_as_int here.... } boost::text::trie<std::string, bool> const bool_trie( {{"foo", true}, {"bar", true}, {"foos", false}, {"", true}}); // What about optional_ref<bool> you ask? There are specializations for it // that have no implcit conversions. You just have to use the familiar // operator* to get its bool value as you would with boost::optional or // std::optional. auto element_3 = bool_trie["foos"]; if (element_3) { // bool element_as_bool = element_3; // Error! No implicit conversion to bool. bool element_as_bool = *element_3; assert(element_as_bool == false); // Use element_as_bool here.... }
The only wonky thing about optional_ref
is the specialization for bool
.
Think about how often a map from T
to bool
comes up. It's almost
never, because that would more naturally be expressed as a set of T
anyway.
Note | |
---|---|
|