Re: General structure of disassembler



Last night I wrote ...

> I have a lot of hobbies, and one of these is writing disassemblers for
> various CPUs and data file formats. Unfortunately I always seem to run
> into
> the same problem: data representation!
> The usual approach is to start with a fixed amount of undifferentiated
> 'source' -- usually large to *very* large (well in the Mb's). I want to
> display this, and have the user (or the program itself) group bytes
> together
> as opcodes, strings, data, whatever.
> Additionally, I'd like to insert comments next to this defined bytes as
> well
> as 'any' text on the lines immediately above or below (think labels and
> such).
> I have experimented both with flags connected directly to the source
> bytes,
> and dumping the whole source into a linked list structure. The pro of the
> first option is that display-to-offset is 1-1 -- the con being that (a)
> either it is impossible to insert additional lines, or (b) with a flag
> 'additional lines available', each source byte has to be scanned for this
> value whenever the cursor moves up or down a few lines. The pro of the
> linked lists is that it's indifferent to line insertions. Each linked list
> item knows to which source byte it points and how many of these are
> 'included' into its line -- a number which also may be zero. The con is,
> we're talking about MBs here! On loading a large file creating the initial
> (untagged) state costs seconds, even on my 3GHz, and when that's done
> moving
> down to the end also takes a long time.
> I have an idea about combining the two approaches (linked lists which may
> also span multiple lines, thus circumventing the initial load delay) but I
> foresee that when nearing complete disassembly almost every single byte
> will
> have its own link -- again, thousands and thousands of linked list items,
> so
> while initially the program may run like hell, at the end it will... run
> like hell, I suppose.
> Any idea's?

... and after a good night's sleep (not!) came up with the following. The
good ol' binary tree.
- store the first line number (= 0) and count (= total # lines) in a tree
- divide the source in two blocks, storing the first line number and count
of each of those 2 blocks
- keep doing that until each individual block is easy to manage with
sequential linked lists (a few k's should do it)

When lines are split or joined all I have to do is propagate the new line
number and counts up its own leaf of the tree up to the root. In that way
the root node will always hold the total number of lines, and finding any
individual line is easy enough. Joining and splitting lines will change the
balance of the tree -- but probably not enough to justify re-iterating.
Anyway, this structure doesn't even have to be saved along with the data, it
can be rebuilt easy enough.
The only caveat is when the last line of a block is joined to the first of
the next but I guess that's left as an exercise to the programmer. (Oh dear,
that's me...)
Since this is about disassembling there should also be a pointer to data
bytes and a count of those in each block, but that can be handled in exactly
the same way.
Any comments on this?

[jongware]


.



Relevant Pages

  • Re: General structure of disassembler
    ... various CPUs and data file formats. ... the same problem: data representation! ... I have an idea about combining the two approaches (linked lists which may ... there would probably be a choice of a block doing 'absolute addressing' vs addressing relative to it's enclosing ...
    (comp.programming)
  • Re: If you were inventing CoBOL...
    ... Strangely they are also slower in Fujitsu. ... inherently slower even when they are simple single linked lists. ... This does not make 'tree lookups faster'. ... one must add sort time to the comparison. ...
    (comp.lang.cobol)
  • Re: If you were inventing CoBOL...
    ... >in the unbalanced tree while you're sorting. ... >putting the logic in a library called by the SORT verb. ... >>encourage to code up double linked lists and tree traversals? ...
    (comp.lang.cobol)
  • General structure of disassembler
    ... I have a lot of hobbies, and one of these is writing disassemblers for ... various CPUs and data file formats. ... I have an idea about combining the two approaches (linked lists which may ... while initially the program may run like hell, ...
    (comp.programming)
  • Re: If you were inventing CoBOL...
    ... I'll build a tree. ... putting the logic in a library called by the SORT verb. ... >encourage to code up double linked lists and tree traversals? ... >> is analogous to the sort time, but tree lookups don't REQUIRE that the ...
    (comp.lang.cobol)