Re: 8052 emulator in C

On Thu, 26 May 2011 10:11:01 -0600, hamilton <hamilton@xxxxxxxxxxx>

On 5/25/2011 11:22 PM, George Neuner wrote:
On Tue, 24 May 2011 13:46:31 -0600, hamilton<hamilton@xxxxxxxxxxx>

I took a compiler class 30 years ago, and my professor at the time
stated that it was not possible.
With the better compiler available today it would be even more impossible.

Either your professor was mistaken or you misunderstood.

The point of decompiling is not to recover the original code, but
rather simply to get something that's easier to work with than an
assembler listing.

Hmmm, Sounds like you agree with that statement.

Reverse engineering can be done in many ways, not just looking directly
at the code.


It wasn't clear to me what you meant by "impossible".

Some languages - e.g. Java, C#, VB, etc. - which compile to canonized
byte code and carry meta-information in the binary can be decompiled
nearly perfectly.

In the general case of a HLL compiled to assembler, it is possible to
recover the compiler's generated template code ... however that may
not match up with the source.

for example, on most CPUs a backward conditional branch is faster than
a forward conditional branch ... so virtually any kind of loop will be
rearranged so that the exit test is at the end. So given the
following C code:

for ( i = 25; i <= 73; ++i )


i = 25;
while ( i <= 73 )

a great many compilers will rearrange either loop into the equivalent:

i = 25;
if ( i > 73) goto END_LABEL;
if ( i <= 73 ) goto START_LABEL;

and then generate assembler to implement it.

A decompiler can fairly easily recover this kind of stylized code,
which itself is easily recognized by a programmer as a do-while loop
with a preceding skip test. A programmer seeing this should know (but
not care) that the original source might actually have been a for or a
while loop. A good decompiler might actually output the do-while loop
rather than the labeled goto version.

My point simply is that it is always possible to work backward from
assembler to an equivalent sequence in the HLL, and equivalency is
sufficient to recover algorithms from the original code.