Re: General structure of disassembler
- From: "[Jongware]" <sorry@xxxxxxxxxxx>
- Date: Mon, 30 Jan 2006 11:11:32 +0100
Last night I wrote ...
> I have a lot of hobbies, and one of these is writing disassemblers for
> various CPUs and data file formats. Unfortunately I always seem to run
> into
> the same problem: data representation!
> The usual approach is to start with a fixed amount of undifferentiated
> 'source' -- usually large to *very* large (well in the Mb's). I want to
> display this, and have the user (or the program itself) group bytes
> together
> as opcodes, strings, data, whatever.
> Additionally, I'd like to insert comments next to this defined bytes as
> well
> as 'any' text on the lines immediately above or below (think labels and
> such).
> I have experimented both with flags connected directly to the source
> bytes,
> and dumping the whole source into a linked list structure. The pro of the
> first option is that display-to-offset is 1-1 -- the con being that (a)
> either it is impossible to insert additional lines, or (b) with a flag
> 'additional lines available', each source byte has to be scanned for this
> value whenever the cursor moves up or down a few lines. The pro of the
> linked lists is that it's indifferent to line insertions. Each linked list
> item knows to which source byte it points and how many of these are
> 'included' into its line -- a number which also may be zero. The con is,
> we're talking about MBs here! On loading a large file creating the initial
> (untagged) state costs seconds, even on my 3GHz, and when that's done
> moving
> down to the end also takes a long time.
> I have an idea about combining the two approaches (linked lists which may
> also span multiple lines, thus circumventing the initial load delay) but I
> foresee that when nearing complete disassembly almost every single byte
> will
> have its own link -- again, thousands and thousands of linked list items,
> so
> while initially the program may run like hell, at the end it will... run
> like hell, I suppose.
> Any idea's?
... and after a good night's sleep (not!) came up with the following. The
good ol' binary tree.
- store the first line number (= 0) and count (= total # lines) in a tree
- divide the source in two blocks, storing the first line number and count
of each of those 2 blocks
- keep doing that until each individual block is easy to manage with
sequential linked lists (a few k's should do it)
When lines are split or joined all I have to do is propagate the new line
number and counts up its own leaf of the tree up to the root. In that way
the root node will always hold the total number of lines, and finding any
individual line is easy enough. Joining and splitting lines will change the
balance of the tree -- but probably not enough to justify re-iterating.
Anyway, this structure doesn't even have to be saved along with the data, it
can be rebuilt easy enough.
The only caveat is when the last line of a block is joined to the first of
the next but I guess that's left as an exercise to the programmer. (Oh dear,
that's me...)
Since this is about disassembling there should also be a pointer to data
bytes and a count of those in each block, but that can be handled in exactly
the same way.
Any comments on this?
[jongware]
.
- Follow-Ups:
- Re: General structure of disassembler
- From: moi
- Re: General structure of disassembler
- References:
- General structure of disassembler
- From: [jongware]
- General structure of disassembler
- Prev by Date: Re: What's the weirdest filesystem out there?
- Next by Date: Re: What's the weirdest filesystem out there?
- Previous by thread: General structure of disassembler
- Next by thread: Re: General structure of disassembler
- Index(es):
Relevant Pages
|
|