Re: Fastest way to highlight printable ASCII?



Jim Leonard wrote:
Terje Mathisen wrote:
First: Is this actually intended for a real 8088 cpu?

Yes. I wouldn't be so concerned with printing strings quickly if it
weren't :-)

If so, then the only serious consideration is the size of your code,
since the actual runtime will always be pretty close to 4 clock cycles
multiplied by the total number of byte load or store operations,
including opcode bytes!

Well... a jump is 17 cycles, whereas one that "falls through" is 4, so
it's not quite that cut-and-dry I've been finding. But I hear you loud
and clear -- the only way I was truly able to eliminate CGA snow ended
up coming down between "MOV AX,BX" and "XCHG BX,AX" -- the XCHG was 1
byte shorter which gave me the time I needed.

The "sub al,33" comment was the main thing I was missing; it completely
eliminates one of the compares (I should know these tricks by now...
bangs head against wall).

The next optimization is to notice that in real life, most strings will

...except for these strings. The application in question is a binary
file viewer :-)

Even so, I believe you should do the statistics: If you average two or more characters between each change of range, then you should at least consider an unrolled loop for each type, which would mean that you only needed a single branch for each domain change.

consist of bytes from a single range, printable or not, and if you can
modify the last byte, then you can save the loop termination test on
each iteration:

Hey, something else I never thought of! This is a great approach,
except like I mentioned, the actual use for the code is viewing binary
files like .EXEs, etc. so the state will mostly be changing 10 times or
more per line. With that many state changes, I'm looking more toward
the lookup table approach since it will only require (127-33) bytes for
the table. I don't know how I'm going to address the table; probably
DB it right into the code and JMP around it or something.

Lookup tables aren't that good on 8088:

You want to modify the high half (AH), right? This means that you cannot simply use a plain XLAT, so you need more instructions and instruction bytes, plus the table load itself.

It might be a win, but I'm not at all sure!

Terje

--
- <Terje.Mathisen@xxxxxxxxxxxxx>
"almost all programming can be viewed as an exercise in caching"

.



Relevant Pages

  • Re: Selective load replay
    ... of-order superscalar processor is the load replay. ... the dependent instructions (instructions which are dependent on the ... leave in the scheduler, until you know that it has executed completely. ...
    (comp.arch)
  • Re: questions about Public Constants
    ... You have an OBSCENE amount of processing on your computer now. ... 20 million vba instructions per second. ... However, in both cases, VBA, or the macro can execute the command to load ...
    (microsoft.public.access.modulesdaovba)
  • Re: IBM 45nm -- new or licensed from Intel?
    ... It's obvious that it is always better to load ... Though on RISCs you need several sethi/setlo instructions ... number of registers (which is not the case between x86-64 and ARM), ... pipeline length can be hidden by predecoding at the cost of ICache size ...
    (comp.arch)
  • Re: OT: Spanish (I think) translator help, please
    ... This sentence doesn't give general instructions, I'm pretty sure practi-taza is the name for a cup included with it, you might call it "practi-scoop", but it will be unique to that company, rather than some kind of size that you'd know if you speak spanish. ... If the weight per load is 5lb for wherever this is from, I'm guessing a US front loader takes 15-20lb, so you'd multiply the amount needed by 3-4, but without the cup, that's going to give a wide range of values. ...
    (rec.crafts.textiles.quilting)
  • Re: Bulldozer on Slashdot
    ... Turn off the second CPU and you get 100%, turn on the second CPU and get 180%. ... (Which Intel claims 100% speedups for.) ... instructions per cycle on average, less for SSE code, as few as 2.5. ... AMD now has separate load and store pipelines, this can be a huge advantage. ...
    (comp.arch)