Re: count2.asm
- From: Frank Kotler <fbkotler@xxxxxxxxxxx>
- Date: Tue, 26 Jul 2005 10:35:37 -0400
Alex McDonald wrote:
[code snipped]
Good lateral thinking;
The hippies call it "getting sideways" :)
Seriously, it illustrates the importance of getting the optimal algoritm figured out, before optimizing the implementation of a less optimal algorithm. I'm not sure we've got the "best" one yet. There are circumstances that could alter the equation.
For example, the "word count" doesn't count words delimited by tabs, or by linefeeds (or other newline convention). To add that functionallity, instead of just "do_punct", another *different* command might be added to the jump table - or more "special case" code added to the loop. This might shift the balance back toward the jump table approach.
Logically, it seemed slow to do multiple adds to several "character type counters", on each character, when we could do a smaller number of adds later. But Robert has raised the possibility of a separate thread to do the calculations while the "main thread" is blocked, waiting on the disk. If we could do those multiple adds "for free", since we'd only be waiting if we didn't do something, this might become faster than doing fewer total adds in "new" time.
I haven't completely given up on "count1.asm", but I think this one is better.
given a sort & output, you should be able to see the letter frequency of etaoin shrdlu too.
I'm trying not to rush that. The data set is small enough so a bubble sort wouldn't kill us, but it might be nice to write a "good" sort, even if we don't really need it for this.
Interestingly, perhaps the output side would benefit from the table method; there's a degree of repetitive pattern in this;
Yes, I'd thought of that. I thought about structures with a "message for this one" and "indices for this one" members. There's just enough difference in the "almost common" code, that I decided to do it as you see - "for now".
mov esi, msg_uppercons call putz mov ebx, uppercons_list call add_em_up add [cons], eax add [alphas], eax call showeaxd
mov esi, msg_lowercons call putz mov ebx, lowercons_list call add_em_up add [cons], eax add [alphas], eax call showeaxd
Maybe a common subroutine taking "pointer to message", "pointer to indices", and "pointer to other adds" as parameters, or something?
This;
ignore: mov ecx, eax cmp esi, edi jz show_results lodsb cmp al, 80h jae ignore add dword [charcounts + eax * 4], 1 %if 1 cmp al, ' ' jnz ignore cmp ecx, eax jz ignore add dword [words], 1 %endif
could be replaced by
ignore: cmp esi, edi jz show_results movzx eax, [esi] add dword [charcounts + eax*4], # 1 add esi, # 1 %if 1 mov edx, eax sub eax, # $21 sbb eax, eax sub ecx, # $21 sbb ecx, ecx not ecx and eax, ecx ; -1 if eax=$20<>ecx sub dword [words], eax mov ecx, edx jmp short ignore %endif
Replacing the cmp/jcc is debatable; it might or might not be faster, depending on the word length.
That might be a good one. "Intuitively", it doesn't seem possible that doing "all that" every time just to save a conditional jump would work, but I know such "sbb tricks" do win out in a lot of instances.
How about unrolling some of the loop? i.e
cmp esi, edi jz deal_with_last_bytes mov edx, [esi] shrd edx, eax, 8 shr eax, 24 add dword [charcounts + eax*4], # 1 shrd edx, eax, 8 shr eax, 24 add dword [charcounts + eax*4], # 1 shrd edx, eax, 8 shr eax, 24 add dword [charcounts + eax*4], # 1 add esi, # 4 ...
That might be another good one. Aren't shifts slow on some processors? Reading four bytes at once seems good. Breaking it back down into four indices seems like it might eat up the savings. Another one I'd have to try to see if it gains us anything.
Thanks for the good ideas!
Best, Frank .
- Follow-Ups:
- Re: count2.asm
- From: wolfgang kern
- Re: count2.asm
- References:
- count2.asm
- From: Frank Kotler
- Re: count2.asm
- From: Alex McDonald
- count2.asm
- Prev by Date: Re: count2.asm
- Next by Date: Re: count2.asm
- Previous by thread: Re: count2.asm
- Next by thread: Re: count2.asm
- Index(es):
Relevant Pages
|