PrevUpHomeNext

Examples

Ropes in Practice

A key use of ropes is in text editors. Every text editor you've ever used that handles large files gracefully has done something to break the file into chunks -- editing the beginning of a 16GB file that is stored in a contiguous sequence in memory is way too slow. The use of a rope is a common approach to remedy this.

In the example directory, there is a program rope_editor. It uses a rope to store the file you're editing. Even when typing at the beginning of a 1GB file, it's very snappy.

[Note] Note

This editor is pretty lame. It doesn't have cut/copy/paste, and can't even open a new file! It uses Emacs keybindings for what little it does.

Let's take a look at the editor. We'll get a sense for how Boost.Text is used in a larger example. The editor is built using libcurses, a popular Unix library for for making character-based UIs suitable for terminals. It stores the file to edit in a single rope, and uses Boost.Text's line breaking algorithm to determine what portions of the rope are on each line. Each character cell in the editor is a grapheme, not a code point or code unit. It probably mis-renders graphemes containing multiple combining code points — again, it's very minimal. However, all the logic is done in terms of graphemes.

#include "app_state.hpp"
#include "curses_interface.hpp"

#include <boost/filesystem/operations.hpp>
#include <boost/filesystem/path.hpp>

#include <iostream>


int main(int argc, char * argv[])
{
    if (argc < 2) {
        std::cerr << "error: You must supply at least a filename.\n";
        return 1;
    }

    boost::filesystem::path path(argv[1]);
    if (!exists(path)) {
        std::cerr << "error: Could not access filename " << argv[1] << ".\n";
        return 1;
    }

    // Requirement of libcurses.
    std::locale::global(std::locale(""));

    curses_interface_t curses_interface;

    // Read the size, but just this once; resizing is not supported.
    auto const screen_size = curses_interface.screen_size();

    app_state_t app_state = {
        // Initial state is the contents of the file from the command line.
        load_buffer(path, curses_interface.screen_size().col_),
        // Keybindings are simplified Emacs.
        emacs_lite()};
    // Initial render.
    render(app_state.buffer_, screen_size);

    // Loop until the user says we're done.  Each iteration blocks for an
    // event from libcurses, then passes the current app state and the event
    // to update(), which returns the new app state.
    boost::optional<app_state_t> next_app_state;
    while (
        (next_app_state =
             update(std::move(app_state), curses_interface.next_event()))) {
        app_state = *next_app_state;
        render(app_state.buffer_, screen_size);
    }
}

First, here's main.cpp. It's not terribly interesting; it's little more than an event loop. Do take note of one thing, though. The main update loop uses no reference semantics. The application state is moved into the update() function, and a new state is returned. This pattern is attractive when your primary data structure is a rope or segmented_vector, since they're so cheap to copy. The only reason for the move here is that the undo history uses a std::vector, which is a lot less copy friendly. Without that std::vector, we could use value semantics even without the moves.

// Represents a key press or a mouse event.
struct key_code_t
{
    key_code_t() : key_(0), x_(0), y_(0) {}
    key_code_t(int c) : key_(c), x_(0), y_(0) {}
    key_code_t(key k);
    key_code_t(int k, int x, int y) : key_(k), x_(x), y_(y) {}

    bool operator==(key_code_t rhs) const { return key_ == rhs.key_; }
    bool operator!=(key_code_t rhs) const { return !(*this == rhs); }

    key_sequence_t operator,(key_code_t rhs) &&;

    int key_;
    int x_;
    int y_;
};

// A sequence of key_code_ts.  Using the ctrl_t and alt_t types and the
// function below, a key_sequence_t can be made using a natural syntax like
// "ctrl-'x', ctrl-'c'" (which means a Control-x followed by a Control-C.
struct key_sequence_t
{
    static const int max_size = 8;

    using iterator =
        boost::container::static_vector<key_code_t, max_size>::const_iterator;

    key_sequence_t() {}
    key_sequence_t(char c) { keys_.push_back(c); }
    key_sequence_t(key k) { keys_.push_back(k); }
    key_sequence_t(key_code_t k) { keys_.push_back(k); }

    bool operator==(key_sequence_t rhs) const { return keys_ == rhs.keys_; }
    bool operator!=(key_sequence_t rhs) const { return !(*this == rhs); }

    bool single_key() const { return keys_.size() == 1; }

    key_code_t get_single_key() const
    {
        assert(single_key());
        return keys_.front();
    }

    iterator begin() const { return keys_.begin(); }
    iterator end() const { return keys_.end(); }

    void append(key_code_t k) { keys_.push_back(k); }

    key_sequence_t operator,(key_code_t k) &&
    {
        key_sequence_t retval(*this);
        keys_.push_back(k);
        return retval;
    }

private:
    boost::container::static_vector<key_code_t, max_size> keys_;
};

Here we define what a key or mouse event looks like, and how to represent a chain of them for making commands that require multiple keystrokes.

// A key mapping that uses a subset of the Emacs key bindings.
key_map_t emacs_lite();

We'll see this function defined later, and we saw it used in main to provide the key bindings used to initialize the app state.

#ifndef EDITOR_BUFFER_HPP
#define EDITOR_BUFFER_HPP

#include "event.hpp"

#include <boost/text/line_break.hpp>
#include <boost/text/rope.hpp>
#include <boost/text/text.hpp>
#include <boost/text/segmented_vector.hpp>
#include <boost/text/word_break.hpp>

#include <boost/filesystem/fstream.hpp>

#include <vector>


// This is our underlying storage type.  This editor takes advantage of the
// efficient copies this type provides.  If you change this to
// boost::text::text, the editor becomes unusably slow.
using content_t = boost::text::rope;

// This is the data needed for laying out and editing within a single line as
// it appears in one of the editor's buffers.  Each column in the editor
// represents a single grapheme.
struct line_t
{
    int code_units_ = 0;
    int graphemes_ = 0;
    bool hard_break_ = false;
};

// Represents an applcation state that we may want to return to later, say
// becauee of an undo operation.
struct snapshot_t
{
    content_t content_;
    boost::text::segmented_vector<line_t> lines_;
    int first_row_ = 0;
    int desired_col_ = 0;
    screen_pos_t cursor_pos_;
    std::ptrdiff_t first_char_index_ = 0;
};

// A single editable buffer in the editor (the editor currently only supports
// one).  It contains a current state, snapshot_, plus a copy of the content
// so we know what the latest saved state, and a history for performing undo
// operations.
//
// Note that the only element here that is not cheap to copy is history_.  Due
// to history_'s unbounded size and potentially costly copy, you should move
// buffer_t's when possible.
struct buffer_t
{
    snapshot_t snapshot_;
    content_t latest_save_;
    boost::filesystem::path path_;
    std::vector<snapshot_t> history_;
};

// This is our dirty-buffer predicate.  We use this to show a '**' in the
// status line when there are unsaved changes.  Try implementing this without
// the equal_root() trick here -- it's very difficult to get right.
inline bool dirty(buffer_t const & buffer)
{
    return !buffer.snapshot_.content_.equal_root(buffer.latest_save_);
}

inline int cursor_line(snapshot_t const & snapshot)
{
    return snapshot.first_row_ + snapshot.cursor_pos_.row_;
}

inline bool cursor_at_last_line(snapshot_t const & snapshot)
{
    return cursor_line(snapshot) == (std::ptrdiff_t)snapshot.lines_.size();
}

struct cursor_iterators_t
{
    content_t::iterator first_;
    content_t::iterator cursor_;
    content_t::iterator last_;
};

inline content_t::iterator
iterator_at_start_of_line(snapshot_t const & snapshot, std::ptrdiff_t line_index)
{
    assert(snapshot.first_row_ <= line_index);

    if (line_index == (std::ptrdiff_t)snapshot.lines_.size())
        return snapshot.content_.end();

    std::ptrdiff_t offset = snapshot.first_char_index_;
    for (std::ptrdiff_t i = snapshot.first_row_; i < line_index; ++i) {
        offset += snapshot.lines_[i].code_units_;
    }

    // Here we peel back the curtain a bit and show how a graphem_iterator is
    // constructed from code point iterators constructed from char iterators.
    auto const first = snapshot.content_.begin().base().base();
    auto const it = first + offset;
    auto const last = snapshot.content_.end().base().base();
    return {content_t::iterator::iterator_type{first, first, last},
            content_t::iterator::iterator_type{first, it, last},
            content_t::iterator::iterator_type{first, last, last}};
}

inline cursor_iterators_t cursor_iterators(snapshot_t const & snapshot)
{
    auto const line_index = cursor_line(snapshot);
    if (line_index == (std::ptrdiff_t)snapshot.lines_.size()) {
        return {snapshot.content_.end(),
                snapshot.content_.end(),
                snapshot.content_.end()};
    }

    auto const line_grapheme_first =
        iterator_at_start_of_line(snapshot, line_index);
    return {line_grapheme_first,
            std::next(line_grapheme_first, snapshot.cursor_pos_.col_),
            std::next(
                line_grapheme_first,
                snapshot.lines_[snapshot.cursor_pos_.row_].graphemes_)};
}

struct cursor_word_t
{
    boost::text::grapheme_view<content_t::iterator::iterator_type> word_;
    content_t::iterator cursor_;
};

inline cursor_word_t cursor_word(snapshot_t const & snapshot)
{
    if (cursor_at_last_line(snapshot)) {
        return {
            {snapshot.content_.end().base(), snapshot.content_.end().base()},
            snapshot.content_.end()};
    }
    auto const iterators = cursor_iterators(snapshot);
    return {boost::text::word(snapshot.content_, iterators.cursor_),
            iterators.cursor_};
}

// This will fill container with pairs of code unit and grapheme extents for
// each line.  A line may end in a hard line break like '\n', or it mey be
// broken because there's just no room left on that line within the screen
// width.  This works for getting the line breaks for all lines in the buffer
// contents, and we can reuse it to re-break lines as we edit the buffer too.
template<typename GraphemeRange, typename LinesContainer>
inline void get_lines(
    GraphemeRange const & range, int screen_width, LinesContainer & container)
{
    for (auto line : boost::text::lines(
             range,
             screen_width - 1,
             [](content_t::const_iterator::iterator_type first,
                content_t::const_iterator::iterator_type last) noexcept {
                 boost::text::grapheme_view<
                     content_t::const_iterator::iterator_type>
                     range(first, last);
                 return std::distance(range.begin(), range.end());
             })) {
        // Note that we don't count the terminating hard line break grapheme
        // here.  This requires special casing in some places, but makes much
        // of the logic simpler.
        line_t const line_{
            int(line.end().base().base() - line.begin().base().base()),
            (int)std::distance(line.begin(), line.end()) -
                (line.hard_break() ? 1 : 0),
            line.hard_break()};
        container.push_back(line_);
    }
}

inline buffer_t load_buffer(boost::filesystem::path path, int screen_width)
{
    boost::filesystem::ifstream ifs(path);

    buffer_t retval;
    retval.path_ = std::move(path);

    int const chunk_size = 1 << 16;
    while (ifs.good()) {
        std::string chunk;
        chunk.resize(chunk_size, ' ');

        ifs.read(const_cast<char *>(chunk.data()), chunk_size);
        if (!ifs.good())
            chunk.resize(ifs.gcount(), ' ');

        retval.snapshot_.content_ += std::move(chunk);
    }

    get_lines(retval.snapshot_.content_, screen_width, retval.snapshot_.lines_);

    retval.latest_save_ = retval.snapshot_.content_;
    retval.history_.push_back(retval.snapshot_);

    return retval;
}

inline void save_buffer(boost::filesystem::path path, buffer_t const & buffer)
{
    boost::filesystem::ofstream ofs(path);
    ofs << buffer.snapshot_.content_;
}

#endif

Here is the full buffer.hpp header. The inline comments tell the story. The types here used in the app_state_t type:

#ifndef EDITOR_APP_STATE_HPP
#define EDITOR_APP_STATE_HPP

#include "buffer.hpp"
#include "event.hpp"
#include "key_mappings.hpp"

#include <boost/optional.hpp>


// The state of our entire editor.  This consists of a single buffer, the key
// bindings in use, and the current incomplete key sequence, if any.
//
// Due to buffer_'s unbounded history size and therefore potentially costly
// copy, you should move app_state_t's when possible.
struct app_state_t
{
    buffer_t buffer_;
    key_map_t key_map_;
    key_sequence_t input_seq_;
};

// Takes the current state of the editor and an event, and returns the new
// state.  It is expected that when the returned optional is empty, the editor
// exits.
boost::optional<app_state_t> update(app_state_t state, event_t event);

#endif

Ok, with this in mind, let's look at the keybindings and update() we saw in main:

key_map_t emacs_lite()
{
    key_map_t retval = {
        key_map_entry_t{ctrl - 'p', move_up},
        key_map_entry_t{up, move_up},
        key_map_entry_t{ctrl - 'n', move_down},
        key_map_entry_t{down, move_down},
        key_map_entry_t{ctrl - 'b', move_left},
        key_map_entry_t{left, move_left},
        key_map_entry_t{ctrl - 'f', move_right},
        key_map_entry_t{right, move_right},

        key_map_entry_t{ctrl - 'a', move_home},
        key_map_entry_t{home, move_home},
        key_map_entry_t{ctrl - 'e', move_end},
        key_map_entry_t{end, move_end},

        key_map_entry_t{page_up, move_page_up},
        key_map_entry_t{page_down, move_page_down},

        key_map_entry_t{left_click, click},
        key_map_entry_t{left_double_click, double_click},
        key_map_entry_t{left_triple_click, triple_click},

        key_map_entry_t{backspace, erase_before},
        key_map_entry_t{delete_, erase_at},

        key_map_entry_t{alt - 'f', word_move_right},
        key_map_entry_t{alt - 'b', word_move_left},

        key_map_entry_t{ctrl - '_', undo},

        key_map_entry_t{(ctrl - 'x', ctrl - 's'), save},
        key_map_entry_t{(ctrl - 'x', ctrl - 'c'), quit},
    };

    return retval;
}

boost::optional<app_state_t> update(app_state_t state, event_t event)
{
    state.input_seq_.append(event.key_code_);
    eval_input_t const input_evaluation =
        eval_input(state.key_map_, state.input_seq_);
    if (input_evaluation.reset_input_)
        state.input_seq_ = key_sequence_t();
    if (input_evaluation.command_) {
        screen_pos_t const xy{event.key_code_.y_, event.key_code_.x_};
        return input_evaluation.command_(
            std::move(state), event.screen_size_, xy);
    }
    return std::move(state);
}

They key bindings are quite minimal. You can use the arrow keys to move the cursor one cell an any direction; you can move forward or backward one word at a time; you can click on a spot to move the cursor there; you can use backspace and delete; you can undo; and you can save and exit.

As for update(), it simply calls eval_input(), and if eval_input() indicates that a command should be executed, it executes that command.

struct eval_input_t
{
    command_t command_;
    bool reset_input_;
};

// Process the current input.  If it is a sequence, try to match it to the
// key bindings.  If it does not match any prefix of any key binding,
// indicate that the current input sequence should be cleared.  Failing
// all that, treat the input as a keyboard input,
eval_input_t eval_input(key_map_t const & key_map, key_sequence_t input_seq)
{
    bool input_greater_than_all = true;
    for (auto const & key_map_entry : key_map) {
        auto const iters = boost::algorithm::mismatch(
            input_seq.begin(),
            input_seq.end(),
            key_map_entry.key_seq_.begin(),
            key_map_entry.key_seq_.end());

        if (iters.first == input_seq.end()) {
            if (iters.second == key_map_entry.key_seq_.end())
                return eval_input_t{key_map_entry.command_, true};
            input_greater_than_all = false;
        }
    }

    if (input_seq.single_key()) {
        auto const key_code = input_seq.get_single_key();
        if (key_code.key_ <= 256) {
            if (key_code.key_ == '\n') {
                return eval_input_t{insert(boost::text::grapheme('\n')),
                                    true};
            } else if (
                ' ' <= key_code.key_ && key_code.key_ <= '~' &&
                boost::text::valid_code_point(key_code.key_)) {
                // We ignore anything not in this narrow range of ASCII.
                // This is a limitation of my ability to understand
                // libcurses, not a limitation of Boost.Text.
                return eval_input_t{
                    insert(boost::text::grapheme(key_code.key_)), true};
            }
        }
    }

    return eval_input_t{{}, input_greater_than_all};
}

As it says inline, this function simply processes keyboard input as it occurs, indicating that a command should be executed when the input matches one of the sequences defined in the key bindings. Now let's look at the implementations of two of the cursor movement actions:

// Move the cursor up one row.  Put the cursor at or before the desired
// column.  Slide the buffer view up one row if necessary.
boost::optional<app_state_t>
move_up(app_state_t state, screen_pos_t, screen_pos_t)
{
    auto & s = state.buffer_.snapshot_;
    if (cursor_line(s) == 0)
        return std::move(state);
    decrement_cursor_row(s);
    if (s.cursor_pos_.col_ != s.desired_col_)
        s.cursor_pos_.col_ = s.desired_col_;
    int const line =
        cursor_at_last_line(s) ? 0 : s.lines_[cursor_line(s)].graphemes_;
    if (line - 1 < s.cursor_pos_.col_)
        s.cursor_pos_.col_ = line;
    return std::move(state);
}

// Move the cursor left one column/grapheme, wrapping around to the first
// column of the next row if necessary.  Slide the buffer view down one
// row if necessary.
boost::optional<app_state_t>
move_right(app_state_t state, screen_pos_t screen_size, screen_pos_t)
{
    auto & s = state.buffer_.snapshot_;
    int const line =
        cursor_at_last_line(s) ? 0 : s.lines_[cursor_line(s)].graphemes_;
    if (s.cursor_pos_.col_ == line) {
        if (cursor_line(s) == max_row(s))
            return std::move(state);
        increment_cursor_row(s, screen_size);
        s.cursor_pos_.col_ = 0;
    } else {
        s.cursor_pos_.col_ += 1;
    }
    set_desired_col(s);
    return std::move(state);
}

The details here are not that important, but what I'm trying to show here is how relatively simple these implementations are. There are multiple cases that must be dealt with, but those are inherent to moving a cursor around in variable-length rows. What we don't have to deal with are all the cases that would come up if we dealt with text as code points or code units.

// Returns true if 'it' starts with a code point that has a property that
// is not a letter or number -- which is what the user intuitively expects
// a word to be.  Note that this only needs to be checked at word
// beginnings and endings, so we leave out the "Mid*" properties below.
bool non_word_grapheme(content_t::iterator it)
{
    auto const prop = boost::text::word_prop(*it->begin());
    return !(
        prop == boost::text::word_property::Katakana ||
        prop == boost::text::word_property::ALetter ||
        prop == boost::text::word_property::Numeric ||
        prop == boost::text::word_property::ExtendNumLet ||
        prop == boost::text::word_property::Regional_Indicator ||
        prop == boost::text::word_property::Hebrew_Letter ||
        prop == boost::text::word_property::ExtPict ||
        prop == boost::text::word_property::Extend);
}

// Move to the beginning of the current word if the cursor is in one, or
// the beginning of the previous word otherwise.  If the cursor is at the
// start of a word, though, move to the previous one.  "Word" in this
// context means those things with letters and/or numbers, not punctuation
// or whitespace.
boost::optional<app_state_t> word_move_left(
    app_state_t state, screen_pos_t screen_width, screen_pos_t xy)
{
    auto & s = state.buffer_.snapshot_;
    auto const word_and_it = cursor_word(s);
    auto it = word_and_it.cursor_;

    auto content_first = s.content_.begin();
    if (it == word_and_it.word_.begin()) {
        // We're already at the beginning of a word; back up to the
        // previous one.
        if (it != content_first)
            it = boost::text::prev_word_break(s.content_, std::prev(it));
    } else {
        it = word_and_it.word_.begin();
    }

    // The word break algorithm finds anything bounded by word breaks to
    // be a word, even whitespace sequences; keep backing up until we hit
    // a "word" that a starts with a letter, number, etc.
    while (it != content_first && non_word_grapheme(it)) {
        it = boost::text::prev_word_break(s.content_, std::prev(it));
    }

    std::ptrdiff_t graphemes_to_move =
        std::distance(it, word_and_it.cursor_);
    for (std::ptrdiff_t i = 0; i < graphemes_to_move; ++i) {
        state = *move_left(std::move(state), screen_width, xy);
    }

    return std::move(state);
}

Even this implementation, which is certainly more complicated than the previous two, is relatively simple thanks to the grapheme-based overload of boost::text::prev_word_break().

boost::optional<app_state_t>
erase_at(app_state_t state, screen_pos_t screen_size, screen_pos_t)
{
    auto & s = state.buffer_.snapshot_;

    auto const cursor_its = cursor_iterators(s);
    if (cursor_at_last_line(s) || cursor_its.cursor_ == s.content_.end())
        return std::move(state);

    state.buffer_.history_.push_back(s);

    auto const cursor_grapheme_cus =
        boost::text::storage_code_units(*cursor_its.cursor_);
    auto line_index = cursor_line(s);

    auto line = s.lines_[line_index];
    if (cursor_its.cursor_ == cursor_its.last_ && line.hard_break_) {
        // End-of-line hard breaks are a special case.
        line.hard_break_ = false;
        line.code_units_ -= 1;
    } else {
        // The cursor may sit just past the last grapheme on a line with
        // no hard break, so the erase may need to happen on the next
        // line.  But who cares!  We're going to re-break this line and
        // all lines until the next hard break anyway.
        line.code_units_ -= cursor_grapheme_cus;
        line.graphemes_ -= 1;
    }

    // Erasing a single grapheme in the first word on row N can allow that
    // word to fit on row N-1, so we need to start from there when
    // re-breaking lines.
    auto rebreak_line_index = line_index;
    if (line_index != 0 && !s.lines_[line_index - 1].hard_break_) {
        --rebreak_line_index;
        decrement_cursor_row(s);
    }

    // If there are previous and next lines, combine them with the current
    // line before we re-break lines below.  This gets rid of special
    // cases like when we're erasing at a spot one past the end of the
    // current line (i.e. the first grapheme of the next line.)
    if (!line.hard_break_ && line_index + 1 < (std::ptrdiff_t)s.lines_.size()) {
        auto const next_line = s.lines_[line_index + 1];
        line.code_units_ += next_line.code_units_;
        line.graphemes_ += next_line.graphemes_;
        line.hard_break_ = next_line.hard_break_;
        s.lines_.erase(s.lines_.begin() + line_index + 1);
    }

    if (rebreak_line_index != line_index) {
        auto const prev_line = s.lines_[rebreak_line_index];
        line.code_units_ += prev_line.code_units_;
        line.graphemes_ += prev_line.graphemes_;
        s.lines_.erase(s.lines_.begin() + rebreak_line_index);
        s.cursor_pos_.col_ += prev_line.graphemes_;
    }

    s.lines_.replace(s.lines_.begin() + rebreak_line_index, line);
    s.content_.erase(cursor_its.cursor_, std::next(cursor_its.cursor_));
    rebreak_wrapped_line(state, rebreak_line_index, screen_size);

    return std::move(state);
}

This is the action invoked when you hit the delete key. It erases a single grapheme that is under the cursor, if any. Note that we're inserting and replacing elements in a sequence container of unbounded size. This is still an efficient operation because our sequence container is a segmented_vector. Even if there are millions of lines, inserting and erasing is an O(1) operation at every place in the sequence (see the unencoded_rope part of the string layer documentation for why).

// When an insertion or erasure happens on line line_index, and line_index
// does not end in a hard break, we need to re-break the lines from the
// line of the edit to the next hard-break-line, or the end if there is no
// later hard break.
void rebreak_wrapped_line(
    app_state_t & state,
    std::ptrdiff_t line_index,
    screen_pos_t screen_size)
{
    auto & s = state.buffer_.snapshot_;
    assert(line_index < (std::ptrdiff_t)s.lines_.size());

    auto const lines_it = s.lines_.begin() + line_index;
    auto lines_last =
        std::find_if(lines_it, s.lines_.end(), [](line_t line) {
            return line.hard_break_;
        });
    if (lines_last != s.lines_.end())
        ++lines_last;

    auto total_graphemes = std::accumulate(
        lines_it, lines_last, 0, [](int result, line_t line) {
            return result + line.graphemes_;
        });
    // Include the hard line break grapheme, if any.
    int const hard_break_grapheme = int(lines_last != s.lines_.end());

    auto const grapheme_first = iterator_at_start_of_line(s, line_index);
    auto const grapheme_last =
        std::next(grapheme_first, total_graphemes + hard_break_grapheme);
    boost::text::grapheme_view<content_t::iterator::iterator_type> const
        graphemes(grapheme_first, grapheme_last);

    std::vector<line_t> replacements;
    get_lines(graphemes, screen_size.col_, replacements);

    // If the edited text ends in a hard line break, we'll end up with an
    // extra line with no code units or graphemes.  If so, keep that.
    if (!std::prev(lines_last)->code_units_)
        replacements.push_back(*std::prev(lines_last));

    s.lines_.replace(lines_it, lines_last, std::move(replacements));
}

In erase_at(), we could take some shortcuts because we knew we were going to re-break from the line where the cursor is until the next line break. We have to do this because a line of text might be broken into an arbitrary number of lines in our editor. If we make an edit on line N, it may end up affecting line breaks all the way to line N + 1000. Notice that again the logic is pretty straightforward. Almost all the logic is devoted to figuring out the range of graphemes we want to re-break. The actual line breaking is just a call to get_lines(), which is itself just a single call to boost::text::lines(), as we saw earlier.

// Returns a callable that inserts the given grapheme into the text.
command_t insert(boost::text::grapheme grapheme)
{
    return [grapheme](
               app_state_t state,
               screen_pos_t screen_size,
               screen_pos_t xy) -> boost::optional<app_state_t> {
        auto & snapshot = state.buffer_.snapshot_;
        state.buffer_.history_.push_back(snapshot);

        // Determine the line we're at, the iterators into the text for
        // that line and poisition, and the offset within the line, in
        // terms of code units and graphemes.
        auto const line_index = cursor_line(snapshot);
        auto const cursor_its = cursor_iterators(snapshot);
        int const code_unit_offset = cursor_its.cursor_.base().base() -
                                     cursor_its.first_.base().base();
        int const grapheme_offset = snapshot.cursor_pos_.col_;

        if (boost::text::storage_code_units(grapheme) == 1 &&
            *grapheme.begin() == '\n') {
            // Inserting a newline is a special case.  We need to mark the
            // current line as having a hard break, and tear off all the
            // code units and graphemes after the insertion point to form
            // a new line.
            snapshot.content_.insert(cursor_its.cursor_, grapheme);
            line_t line;
            if (line_index < (std::ptrdiff_t)snapshot.lines_.size())
                line = snapshot.lines_[line_index];

            line_t const new_line{
                line.code_units_ - code_unit_offset,
                line.graphemes_ - grapheme_offset,
                line.hard_break_};
            snapshot.lines_.insert(
                snapshot.lines_.begin() + line_index + 1, new_line);

            line.code_units_ = code_unit_offset + 1;
            line.graphemes_ = grapheme_offset;
            line.hard_break_ = true;
            snapshot.lines_.replace(
                snapshot.lines_.begin() + line_index, line);

            rebreak_wrapped_line(state, line_index + 1, screen_size);

            snapshot.cursor_pos_.col_ = 0;
            set_desired_col(snapshot);
            state = *move_down(std::move(state), screen_size, xy);
        } else {
            // Insertion of a non-newline is the general case.  First, we
            // need to figure out if the insertion affects the cursor
            // grapheme or the one previous. See
            // grapheme_insertion_deltas() for why.
            using graphme_view = decltype(*cursor_its.first_);
            graphme_view prev_grapheme;
            if (cursor_its.cursor_ != snapshot.content_.begin())
                prev_grapheme = *std::prev(cursor_its.cursor_);
            graphme_view next_grapheme;
            if (cursor_its.cursor_ != snapshot.content_.end())
                next_grapheme = *cursor_its.cursor_;
            auto const deltas = grapheme_insertion_deltas(
                prev_grapheme, grapheme, next_grapheme);

            auto rebreak_line_index = line_index;
            auto curr_line_delta = deltas.next_;
            if (!snapshot.cursor_pos_.col_ && deltas.prev_.code_units_) {
                // For changes to the previous grapheme when we're at the
                // start of the line:
                assert(0 < deltas.prev_.code_units_);
                assert(!deltas.prev_.graphemes_);
                line_t prev_line = snapshot.lines_[line_index - 1];
                prev_line.code_units_ += deltas.prev_.code_units_;
                prev_line.graphemes_ += deltas.prev_.graphemes_;
                snapshot.lines_.replace(
                    snapshot.lines_.begin() + line_index - 1, prev_line);
                --rebreak_line_index;
            } else {
                // For the typical case, in which only the cursor grapheme
                // is affected:
                curr_line_delta.code_units_ += deltas.prev_.code_units_;
                curr_line_delta.graphemes_ += deltas.prev_.graphemes_;
            }

            // Modify the current line, or create a new one.
            line_t line;
            if (!cursor_at_last_line(snapshot))
                line = snapshot.lines_[line_index];
            line.code_units_ += curr_line_delta.code_units_;
            line.graphemes_ += curr_line_delta.graphemes_;
            if (cursor_at_last_line(snapshot)) {
                snapshot.lines_.insert(snapshot.lines_.end(), line);
            } else {
                snapshot.lines_.replace(
                    snapshot.lines_.begin() + line_index, line);
            }

            snapshot.content_.insert(cursor_its.cursor_, grapheme);
            rebreak_wrapped_line(state, rebreak_line_index, screen_size);

            auto const cursor_line = snapshot.lines_[line_index];
            ++snapshot.cursor_pos_.col_;
            if (cursor_line.graphemes_ < snapshot.cursor_pos_.col_) {
                snapshot.cursor_pos_.col_ =
                    snapshot.cursor_pos_.col_ - cursor_line.graphemes_;
                increment_cursor_row(snapshot, screen_size);
            }
        }

        return std::move(state);
    };
}

Finally, we see an implementation that's pretty complicated. Part of the reason for this is that I've chosen to treat hard line breaks specially, so there's special handling for inserting one. But the main complication is that graphemes themselves are complicated. This is because they are context sensitive — the code points before and after a grapheme may affect that grapheme's boundaries. The next function explains why. Note that erases don't have to deal with this, because they always deal in whole graphemes.

// Ok, this is a little weird.  When you insert a grapheme into text,
// sometimes it just vanishes, sort of.  Unicode!  In particular, a
// combining code point like a 'COMBINING DIAERESIS' (U+0308) is a
// grapheme when it's all by itself.  So, if you insert this particular
// grapheme at location 'it', the grapheme at location 'std::prev(it)'
// might just absorb your inserted grapheme (the combining diaresis).
// This function tries to insert a grapheme between two other graphemes,
// either or both of which may be empty.  From how and if the inserted
// grapheme is absorbed (or absorbs -- Unicode!) its neighbors, we can
// determine the number of inserted graphemes and code points that
// results.
template<typename CPIter>
edit_deltas grapheme_insertion_deltas(
    boost::text::grapheme_ref<CPIter> prev_grapheme,
    boost::text::grapheme insertion,
    boost::text::grapheme_ref<CPIter> next_grapheme)
{
    boost::text::text t;
    t.insert(t.end(), prev_grapheme);
    auto const r = t.insert(t.end(), next_grapheme);
    auto const initial_distance = t.distance();
    t.insert(r.begin(), insertion);
    auto const distance = t.distance();
    if (distance == initial_distance + 1) {
        return {{}, {boost::text::storage_code_units(insertion), 1}};
    } else {
        assert(distance == initial_distance);
        int const prev_sizes[2] = {
            boost::text::storage_code_units(prev_grapheme),
            boost::text::storage_code_units(next_grapheme)};
        int const sizes[2] = {
            boost::text::storage_code_units(*t.begin()),
            boost::text::storage_code_units(*std::next(t.begin()))};
        return {
            {sizes[0] - prev_sizes[0], 0}, {sizes[1] - prev_sizes[1], 0}};
    }
}

That's messed up. So when I insert arbitrary text that is considered a whole grapheme, it may contextually stop being a whole grapheme, and instead be merged into another existing one. Still, figuring out if and how this happens is not much code.

[Note] Note

grapheme_insertion_deltas() probably doesn't need to allocate, considering that text is implemented in terms of std::string, which has a small-buffer optimization. Very large graphemes will of course make this function allocate. I could also use a fixed-capacity array of 4 * 32 * 3 bytes and be guaranteed not to allocate. The 4 is for the maximum code units per code point, the 32 is for the maximum code points per grapheme (because of the Stream-Safe Format assumption), and the 3 is the number of graphemes we're working with.

// Undo by restoring the most recent state in the history.  Note that this
// undo is destructive, since there's no redo to worry about implementing.
boost::optional<app_state_t>
undo(app_state_t state, screen_pos_t, screen_pos_t)
{
    state.buffer_.snapshot_ = state.buffer_.history_.back();
    if (1 < state.buffer_.history_.size())
        state.buffer_.history_.pop_back();
    return std::move(state);
}

boost::optional<app_state_t> save(app_state_t state, screen_pos_t, screen_pos_t)
{
    save_buffer(state.buffer_.path_, state.buffer_);
    state.buffer_.latest_save_ = state.buffer_.snapshot_.content_;
    return std::move(state);
}

boost::optional<app_state_t> quit(app_state_t, screen_pos_t, screen_pos_t)
{
    return boost::none;
}

The implementation of undo() is a punchline of sorts. If you look back at all the branches required to implement all the actions in the editor, it seems stunning that the undo action could be so simple. It's also surprising that it does what looks like a bulk copy in O(1), making it efficient as well.


PrevUpHomeNext