Re: Writing a Text Editor in Lisp
From: Robert Strandh (strandh_at_labri.fr)
Date: 09/12/04
- Next message: Roy Leonard: "Re: Newbie RFC: A utility macro"
- Previous message: Carl Shapiro: "Re: meaning of a dialet or implementation of a programming language"
- In reply to: David Steuber: "Writing a Text Editor in Lisp"
- Next in thread: John Thingstad: "Re: Writing a Text Editor in Lisp"
- Reply: John Thingstad: "Re: Writing a Text Editor in Lisp"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 12 Sep 2004 07:44:11 +0200
David Steuber <david@david-steuber.com> writes:
> So I would like to start off with a simple question. What is the best
> underlying data structure to hold the text to be manipulated? I've
> been thinking an array of strings (one string per line) would be the
> way to go. Or perhaps an array of arrays. I'm not presently worried
> about trying to deal with binary files, but you never know.
It is hard to say which is the "best" data structure without knowing
more about performance requirements and such.
GNU Emacs uses (or at least it did at some point, perhaps that has
changed) a gap buffer to hold the entire text of the buffer. It is a
fairly simple implementation that is efficient when a sequence of
insert and delete operations are close to each other in the buffer.
If they are far apart, the entire text may have to be moved at each
operation. This data structure makes it hard to determine information
about lines, such as which line point is on and how many lines there
are in the buffer.
Multics Emacs used a doubly linked list of strings, on string per
line. Each insert and delete operation moved every character to the
right of point on the same line. This structure works fine if lines
are short, but can be quite slow when lines are long. Though today
your average PC is about 1000 times as fast as the Multics machine at
the time, so this is probably not a huge problem. Searching for a
string with newlines in it gets more complicated than in the case of a
single gap buffer.
Hemlock uses a doubly linked list of lines. Each line except the
current (open) one is a string. The current one is implemented as a
gap buffer. A bad case for this structure would be alternating
editing operations on different lines, which would require conversions
between a string and a gap buffer at each operation. This problem
could be fixed by allowing several lines to be open and close them in
an LRU order.
It is probably not a good idea to expose the doubly linked list of
lines as part of the API. In Hemlock, given a line, the API allows
you to find the previous or the next line, which essentially
*requires* a doubly linked list. It then gets very hard to change the
data structure later if desired.
There is also the issue of "undo". If you store the text in a
tree-like data structure, each editing operation would cost O(log n)
which is not too bad on today's machines. You then get undo for free,
since you can easily share most of the tree between all versions. In
the structures above, you would have to save undo information for each
operation.
> Text editing is not exactly a new art. I've just never done it
> before. Are there any online sources that would be useful to me?
> What advice do you have?
The code of Portable Hemlock is mostly pretty easy to read, even
though some of it is pretty old looking with lots of structures
instead of classes and some optimizations that were probably necessary
at the time but that know mostly makes maintenance harder.
> A blue sky goal would be the functionality of Emacs. I also wish to
> keep display/input logic as seperate as possible from the actual
> buffer handling and text editing functions. That way, I can keep the
> back end ANSI Common Lisp and take advantage of implementation
> features for the UI and not have to rewrite the back end stuff if I
> port to a different CL implementation.
>
> There are some other things I'm not sure about. Apart from the raw
> text, there is usually other state information:
>
> * text insertion point
> * any marks
If you have a gap buffer, you can attach marks to physical locations
in the buffer so that marks do not need to be updated at each
operation. In the case of a string per line, you need to update all
marks on that line at each operation.
> * syntax highlighting (is this strictly a display issue?)
Pretty hard stuff. Essentially, you need to scan the entire buffer
from the beginning by a parser that knows the language you are
editing. GNU Emacs cheats by only backing up as far as a line with no
blanks in the first column.
You could also use a parser with explicit state (including the stack)
and attach that state every so often to a mark. If you have explicit
lines, you can attach this information to a line.
Of course, if you are editing Common Lisp, all bets are off if someone
changes the read tables. The best you could do would be to write your
own version of read with explicit state information, and only
highlight when you are in one of the predefined CL read functions.
> * paren matching
Essentially the same issue as syntax highlighting.
> * character encoding (ASCII, Unicode, etc)
If you use Unicode, you need 21 bits per character. It might then be
worthwhile to allow for a more compact representation of chunks of
text needing only 8 bits per character. You could use something like
UTF-8, but I suspect it would be a nightmare to implement even the
basic editing operations. If you have lines, each line could
potentially have a different representation.
> I've also considered writing a s-exp editor, but I have no idea how to
> deal with issues such as comments and case preservation. Also there
> are things like REPL and debugger functionality (any TTY type stuff
> really) that I would like to be able to handle. I would also like to
> be able to use this editor as the core for a Lisp IDE. Emacs seems to
> be a good model. I haven't studied Emacs sources either, nor do I
> know ELisp.
You would probably be better off using CLIM as the core of the IDE and
have the editor be just another CLIM pane.
> If I move on this, I will initially target OpenMCL and the Carbon API
> for the UI on OSX. Keeping the display logic separate from underlying
> buffer management and text editing functions is more of a hedge in
> case I feel the need to go with another Lisp (like SBCL on Linux or
> whatever). I also want to be able to do the text handling stuff
> before I worry too much about the UI.
If I were you, I would work on Portable Hemlock instead. As others
have pointed out, what you are trying to do is a pretty huge task (as
in you will most likely never be able to finish it, or even to get it
to a somewhat useful state). Starting with an existing implementation
of CLIM such as McCLIM and with a working (though in need for some
maintenance) implementation of the editor such as Hemlock would be a
lot easier. You are of course free to do what you want, but it would
probably be more useful to yourself and others if you would help out
with the Portable Hemlock project.
-- Robert Strandh
- Next message: Roy Leonard: "Re: Newbie RFC: A utility macro"
- Previous message: Carl Shapiro: "Re: meaning of a dialet or implementation of a programming language"
- In reply to: David Steuber: "Writing a Text Editor in Lisp"
- Next in thread: John Thingstad: "Re: Writing a Text Editor in Lisp"
- Reply: John Thingstad: "Re: Writing a Text Editor in Lisp"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|