8051 Speed Optimization



I'm working on optimizing code for a C8051F120 that needs to run at an
extremely fast clip. The following few lines are repeated over, as
fast as possible. The looping is a straightforward djnz. The bulk of
the cpu time is spent on the following lines. This is repeated (cut
and paste, but with the binary value changed) 8 times per loop.

; for reference
rLEVEL equ R0
mDATAOUT DATA 64

; the code
row0:
movx A, @DPTR
inc DPTR
subb A, rLEVEL
jc row1
orl mDATAOUT, #00000001b
row1:

At row1 the next set of those 5 lines executes. Essentially what this
is doing is taking the data byte at @DPTR, comparing it to the current
value of rLEVEL, and setting a bit in mDATAOUT if it is greater. It
does this for the sequential bytes at DPTR, but for the range of the
bitfield (#00000001b to #10000000b).

Can anyone see a way to optimize out some cycles from this process?
For reference, this is a chip doing video output. Even single cycle
optimizations can be big, at the rate that this block is being iterated
over.

Thanks to anyone that can help.

Alex McHale

.