Re: Process Register Directly Is Slow?
- From: "Bryan Parkoff" <spamtrap@xxxxxxxxxx>
- Date: Fri, 26 Jan 2007 13:54:58 -0600
"Gil Hamilton" <spamtrap@xxxxxxxxxx> wrote in message
news:Xns98C44D4ECF0DCgilhamiltonhotmailco@xxxxxxxxxxxxxxxx
"Bryan Parkoff" <spamtrap@xxxxxxxxxx> wrote:
Many assembly programmers claim that loading data from memory or
storing data into memory is always slow. They recommend to place data
into register from memory before process data in register directly may
be faster.
Do you really need a routine to process this register directly? What
about a table? This table has 48 bytes length and this routine has
about 29 bytes length. This routine translates 0, 1, 2, 3, etc to
$400, $480, $500, $580, etc.
I would claim to use only 2-3 x86 instructions like below.
add ebx, 02h
mov al, byte ptr [0416000h+ebx]
Hmmm.... where to start...
First: it isn't clear what you're asking (or perhaps advocating).
Second: all other things being equal, yes, accessing memory is slower
than accessing registers. Hence, a simple enough transform might be
better done with register manipulations than with a table lookup.
However, there are many other factors that determine how fast something
runs, including size of code, branches in the code, exact mix of
instructions used, which processor model you're running on, alignment of
the code and data, cache misses, page faults / TLB misses, and others.
So it's impossible to say definitively without knowing all those things.
Third: the assembly code you give does not do what you say it does above
(at least assuming the obvious extrapolation of the given sequences).
Fourth: you would generally be better off writing your code in C and
letting your compiler generate correct machine code than writing
assembly code that doesn't do what you intend it to do.
For example, to accomplish what you say you want above (map 0,1,2,3,...
[Line] to hex 400,480,500,580... [Base]), use:
Base = (Line * 0x80) + 0x400;
which a decent compiler will optimize into something like:
movz ax, [Line]
add ax, 8
shl ax, 7
mov [Base], ax
This would be better on most processor models than either a table lookup
or your (apparently incorrect) assembly code.
Please look at my C++ source code using __asm. It is up to you to
decide what you provide your opinion. Is this routine worse for poor
performance because of 18 instructions than you have only 2
instructions?
The C code equivalent to your assembly code is something like this:
unsigned char ch, cl;
ch = ((Line >> 1) & 0x03) | 0x04;
cl = (Line & 0x18);
if (Line & 1)
cl += 0x80;
cl |= cl << 2;
Base = (ch << 8) | cl;
Is that really what you intended?
GH,
Yes, it is a good explanation. It is what I intended to use my original
code. Thank you for translating assembly code into C++ code. I am sorry
when I am not good to speak in English. I always use C++ code to test for
speed first before assembly code. Your example, " Base = (Line * 0x80) +
0x400" looks good, but my original code is tricky because it has three
groups to generate table lookup like below. My original code works fine.
It is only a question according to your opinion. Do you think look-up table
is slower than my original code as routine.
Of course, I don't use Profiler, but I use time stamp instruction to
measure the time per total instructions spent. This table lookup is 24
entries. What if it goes up to 1,024 entries or 20,480 entries? It might
be slow to use pick-up table. Then short routine (10-30 instructions) might
be accepted. Large table pick-up may have cache misses when loading /
saving from / into memory directly in a loop is high demand.
Group 1
0 400
1 480
2 500
3 580
4 600
5 680
6 700
7 780
Group 2
8 428
9 4a8
10 528
11 5a8
12 628
13 6a8
14 728
15 7a8
Group 3
16 450
17 4d0
18 550
19 5d0
20 650
21 6d0
22 750
23 7d0
Bryan Parkoff
.
- References:
- Process Register Directly Is Slow?
- From: Bryan Parkoff
- Re: Process Register Directly Is Slow?
- From: Gil Hamilton
- Process Register Directly Is Slow?
- Prev by Date: Re: [Clax86list] Looking for "binary bombs"
- Next by Date: Re: [Clax86list] Looking for "binary bombs"
- Previous by thread: Re: Process Register Directly Is Slow?
- Next by thread: Re: Process Register Directly Is Slow?
- Index(es):
Relevant Pages
|