Re: 8052 emulator in C



On Thu, 26 May 2011 10:11:01 -0600, hamilton <hamilton@xxxxxxxxxxx>
wrote:

On 5/25/2011 11:22 PM, George Neuner wrote:
On Tue, 24 May 2011 13:46:31 -0600, hamilton<hamilton@xxxxxxxxxxx>

I took a compiler class 30 years ago, and my professor at the time
stated that it was not possible.
With the better compiler available today it would be even more impossible.

Either your professor was mistaken or you misunderstood.

The point of decompiling is not to recover the original code, but
rather simply to get something that's easier to work with than an
assembler listing.

Hmmm, Sounds like you agree with that statement.

Reverse engineering can be done in many ways, not just looking directly
at the code.

hamilton

It wasn't clear to me what you meant by "impossible".

Some languages - e.g. Java, C#, VB, etc. - which compile to canonized
byte code and carry meta-information in the binary can be decompiled
nearly perfectly.

In the general case of a HLL compiled to assembler, it is possible to
recover the compiler's generated template code ... however that may
not match up with the source.

for example, on most CPUs a backward conditional branch is faster than
a forward conditional branch ... so virtually any kind of loop will be
rearranged so that the exit test is at the end. So given the
following C code:

for ( i = 25; i <= 73; ++i )
{
:
}

or

i = 25;
while ( i <= 73 )
{
:
++i;
}

a great many compilers will rearrange either loop into the equivalent:

i = 25;
if ( i > 73) goto END_LABEL;
:START_LABEL
:
++i;
if ( i <= 73 ) goto START_LABEL;
:END_LABEL

and then generate assembler to implement it.

A decompiler can fairly easily recover this kind of stylized code,
which itself is easily recognized by a programmer as a do-while loop
with a preceding skip test. A programmer seeing this should know (but
not care) that the original source might actually have been a for or a
while loop. A good decompiler might actually output the do-while loop
rather than the labeled goto version.

My point simply is that it is always possible to work backward from
assembler to an equivalent sequence in the HLL, and equivalency is
sufficient to recover algorithms from the original code.

George
.



Relevant Pages

  • Re: Linux without the GNU toolchain?
    ... We did have an assembler which I used to bootstrap everything. ... We also had a FORTRAN compiler that generated code incompatible with the ... that if the optimizer noticed it was in trouble, it would just exit and pass ... Once expanded inline, I moved ALL the instructions from the loop, so they ...
    (comp.os.linux.misc)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... >> both less efficient and less safe than the Fortran and Basic standard. ... >> The C for loop is actually trying to do what a do loop does. ... sloppy thinking that results from confusing a programming language ... > I do not believe that you are capable of writing a conforming C compiler. ...
    (comp.programming)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... it's a for loop in the C sense. ... > sloppy thinking that results from confusing a programming language ... >> I do not believe that you are capable of writing a conforming C compiler. ... Does Microsoft's C compiler perform this optimisation? ...
    (comp.programming)
  • Re: Histogram of character frequencies
    ... generated object code may simply be a loop in which elements are ... believe any C compiler anywhere would reject it. ... On the first iteration of the loop you test the end of file indicator ...
    (comp.lang.c)
  • Re: Fridays the thirteenth. (And a little puzzle.)
    ... -- compiler) is the usual method ... int febdays ... -- We're going to go round a loop dealing with each year in turn. ... -- other languages call) ...
    (uk.people.silversurfers)