Re: count2.asm



Alex McDonald wrote:

[code snipped]

Good lateral thinking;

The hippies call it "getting sideways" :)

Seriously, it illustrates the importance of getting the optimal algoritm figured out, before optimizing the implementation of a less optimal algorithm. I'm not sure we've got the "best" one yet. There are circumstances that could alter the equation.

For example, the "word count" doesn't count words delimited by tabs, or by linefeeds (or other newline convention). To add that functionallity, instead of just "do_punct", another *different* command might be added to the jump table - or more "special case" code added to the loop. This might shift the balance back toward the jump table approach.

Logically, it seemed slow to do multiple adds to several "character type counters", on each character, when we could do a smaller number of adds later. But Robert has raised the possibility of a separate thread to do the calculations while the "main thread" is blocked, waiting on the disk. If we could do those multiple adds "for free", since we'd only be waiting if we didn't do something, this might become faster than doing fewer total adds in "new" time.

I haven't completely given up on "count1.asm", but I think this one is better.

given a sort & output, you should be able to see the letter frequency of etaoin shrdlu too.

I'm trying not to rush that. The data set is small enough so a bubble sort wouldn't kill us, but it might be nice to write a "good" sort, even if we don't really need it for this.


Interestingly, perhaps the output side would benefit from the table method; there's a degree of repetitive pattern in this;

Yes, I'd thought of that. I thought about structures with a "message for this one" and "indices for this one" members. There's just enough difference in the "almost common" code, that I decided to do it as you see - "for now".


    mov esi, msg_uppercons
    call putz
    mov ebx, uppercons_list
    call add_em_up
    add [cons], eax
    add [alphas], eax
    call showeaxd

    mov esi, msg_lowercons
    call putz
    mov ebx, lowercons_list
    call add_em_up
    add [cons], eax
    add [alphas], eax
    call showeaxd

Maybe a common subroutine taking "pointer to message", "pointer to indices", and "pointer to other adds" as parameters, or something?


This;

ignore:
    mov ecx, eax
    cmp esi, edi
    jz show_results
    lodsb
    cmp al, 80h
    jae ignore
    add dword [charcounts + eax * 4], 1
%if 1
    cmp al, ' '
    jnz ignore
    cmp ecx, eax
    jz ignore
    add dword [words], 1
%endif

could be replaced by

ignore:
    cmp esi, edi
    jz show_results
    movzx eax, [esi]
    add dword [charcounts + eax*4], # 1
    add esi, # 1
%if 1
    mov edx, eax
    sub eax, # $21
    sbb eax, eax
    sub ecx, # $21
    sbb ecx, ecx
    not ecx
    and eax, ecx ; -1 if eax=$20<>ecx
    sub dword [words], eax
    mov ecx, edx
    jmp short ignore
%endif

Replacing the cmp/jcc is debatable; it might or might not be faster, depending on the word length.

That might be a good one. "Intuitively", it doesn't seem possible that doing "all that" every time just to save a conditional jump would work, but I know such "sbb tricks" do win out in a lot of instances.


How about unrolling some of the loop? i.e

    cmp esi, edi
    jz deal_with_last_bytes
    mov edx, [esi]
    shrd edx, eax, 8
    shr  eax, 24
    add dword [charcounts + eax*4], # 1
    shrd edx, eax, 8
    shr  eax, 24
    add dword [charcounts + eax*4], # 1
    shrd edx, eax, 8
    shr  eax, 24
    add dword [charcounts + eax*4], # 1
    add esi, # 4
    ...

That might be another good one. Aren't shifts slow on some processors? Reading four bytes at once seems good. Breaking it back down into four indices seems like it might eat up the savings. Another one I'd have to try to see if it gains us anything.


Thanks for the good ideas!

Best,
Frank
.



Relevant Pages