Re: multibyte subtraction using AAS and ASCII Decimals
- From: "Wolfgang Kern" <nowhere@xxxxxxxxxxx>
- Date: Mon, 23 Apr 2007 20:58:48 +0200
"cr88192" wrote:
[..]
the assembler versions are just horrors.I don't think so...there is a lot of power in it.
well, I was referring mostly to "fully expanded" multiplies (ie: all theworse > (for an autogenerated version, several kloc for a single function,
loops are unrolled into a mass of fixed-purpose code...).
in C, this results in huge masses of code, and in assembler it is far
....).
multiplies can be scary...
Loop unrolling may not always speed things up.
Many fetch and cache penalties may destroy the gained saving on
cc-branches.
[]
well, loops involve a mass of compares and conditional jumps, so if it is
flattened out into a single mass of instructions, it should go faster.
still, this is only reasonable for smaller sizes it seems.
64 bits, yes, fairly simple (4 multiplies for a 64->128 bit expansion).
128 bits, at least, still reasonable to type out.
256 bits, errm, a chore to type even in C.
512 bits, errm, no. this beast comes out at 3 kloc.
1024 bits, quite a bit worse.
also, for the 256 and greater, I end up loosing as much register density,
so it makes more sense to just do it in C and hope the compiler can do it
good enough.
Might be interesting to compare performance ...
themsome operations seem terribly dubious though, such as the large
multiplies:
they are so bulky I have doubt as to what the processor will do with
(like 3k lines of assembler for a 512-bit multiply, insane...).
Look at my homepage example of how to do fast MUL on any integer size.ok.
It easy can be expanded to act on more than 512 bits.
it is dense and small, but would involve more operations/per multiply than
my versions do.
basically, my versions have 2 memory accesses per mul, wheras this version
has 4 (and a few other accesses to memory).
With only two memory accesses ?
I see you work just on 32-bit junks and add edx before the next mul.
it also has the overhead of implementing loops (IME, conditional jumps are
unreasonably expensive, and thus I prefer to avoid them whenever
reasonable).
then again, I begin to wonder if for large types (above 128 or 256 bits),
if it makes more sense to just use loops.
Right, I haven't checked the break-even point [fetch vs. branch penalties],
I just found it more convenient to have only one short routine to cover
all MUL above 64*64-bits.
problem with multiplies is that they get larger at an n^2 rate...
unless there is some factoring teqnique of which I am not aware
(ie: as there is with the DCT...).
I'm not familiar with the term "DCT".
just me remembering, the DCT cam be implemented with a complexity of O(n
log2 n) or such (even though cannonically, it takes O(n^2)).
part of the reason is that when expanded (and the coefficients are
evaluated), there are some amount of shared terms, which can be factored
giving a fairly dense calculation.
however, I am not sure if anything like this is possible for multiplies, I
doubt it.
I got a partial LOG LUT but it's limited to 256 bits resolution,
it's fine for num-Ascii->bin conversion, divide and a few more,
but it wont speed up large-by-large MUL.
[...]
ended up burning out on this.[...]
but oh well...).
actually, I still kind of doubt that the code will work reliably.
Sooner or later you will find a solution which fits your goal :)
__
wolfgang
.
- Follow-Ups:
- Re: multibyte subtraction using AAS and ASCII Decimals
- From: cr88192
- Re: multibyte subtraction using AAS and ASCII Decimals
- References:
- multibyte subtraction using AAS and ASCII Decimals
- From: Brian
- Re: multibyte subtraction using AAS and ASCII Decimals
- From: Wolfgang Kern
- Re: multibyte subtraction using AAS and ASCII Decimals
- From: cr88192
- Re: multibyte subtraction using AAS and ASCII Decimals
- From: Wolfgang Kern
- Re: multibyte subtraction using AAS and ASCII Decimals
- From: cr88192
- Re: multibyte subtraction using AAS and ASCII Decimals
- From: Wolfgang Kern
- Re: multibyte subtraction using AAS and ASCII Decimals
- From: cr88192
- multibyte subtraction using AAS and ASCII Decimals
- Prev by Date: Re: passing variables between C++ and assembly language
- Next by Date: Re: updated assembler
- Previous by thread: Re: multibyte subtraction using AAS and ASCII Decimals
- Next by thread: Re: multibyte subtraction using AAS and ASCII Decimals
- Index(es):
Relevant Pages
|