Re: Optimization help please



I do suspect that if you are reading the file using multiple buffers and
threads you can get much better performance by using overlapped IO. That
will permit one buffer to be CRC'd while the next buffer is being filled.
Multiple threads would permit those with the new dual core or SMP systems to
more fully utilize the available resources.

The VC 6 compiler is not as efficient as the current version. Also Intel's
compiler is the best I have heard about. I am saying that there are very
few people that can improve such a routine using assembly. I also am saying
that with the variations in current processors, any assembler code will have
to contain compromises that will decrease speed when one of the other
processors are encountered. It would be possible to write a special routine
for each processor and use a function pointer to call the appropriate one
based upon the cpu.

Have the compiler output assembler code for that routine and see if any
inefficiencies can be found. You may find multiple tests that can be done
once. You may find stores of the data to memory with each loop cycle to be
unnecessary and inefficient. You might declare the CRC variable with the
register keyword. Very short routines have very little opportunity for
speed increases. Using some of the new instructions, SSE, MMX, etc. may
permit more bytes to be processed in each loop cycle.

Some more information about how much of these ideas are already present
would be helpful. Just saying you need a rather simple routine optimized
would cause most of us to ask a lot of questions. You said you needed 'such
an algorithm' which means that what you really need is just a way to
determine if a block of data or a file has changed.

Just to make sure you have already eliminated the obvious, the IO will cost
far more cycles than the CRC routine. Depending upon the storage media it
may be necessary to use several threads with all of them requesting data to
be read then the CRC can be computed on each 'block' and the algorithm could
then combine them to allow the system to run as fast as it possibly can.

You did say that the IO has been optimized, so some of the above may be
implemented.

"Floptimize" <spamtrap@xxxxxxxxxx> wrote in message
news:1126800211.784160.313790@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> Thanks for taking time to reply to my post, however I find it
> rhetorical and presumptuous. Please don't take offence to my reply.
>
> Of course I understand the algorithm, and of course I know its not C++,
> and of course we have a need for such an algorithm in our design. The
> need to translate it into assembly is NOT just for fun.
>
> If you are disappointed in my post because I am asking for assistance,
> say it directly.
>
> I respect your valid opinion about the efficiency of compilers, and
> target processors. Our application is not designed for specific
> processors, just Windows 2000 and up, on x86 architecture.
>
> Our applications transmit video and audio files over a 100mb network.
> The files can be very large, and we do not have absolute control of the
> files at either end. We use CRC32 checksums to make logical decisions
> about whether or not to send the files over the network. We also use
> other indicators such as timestamps and file size. Our applications
> cache the checksums for the lifetime of the process, so when
> referencing the file for first time we must re-compute the checksum of
> the file. Reading the file in its entirety from disk and computing
> checksum is cpu and disk intensive, as you well know. We have done as
> well as we can to optimize the disk read performance which is the
> biggest bottleneck and now we want improve the performance of the
> checksum algorithm. Maybe there is another way to achieve our goals
> that we are unaware of....
>
> My post is prompted by an article I read about benchmarks of checksum
> algorithms written in assembly versus C. The algorithm I read about is
> not standard CRC32, and I am not experienced enough in intel x86
> assembly language to convert my routine from C.
>
> Another reason I asked for help is that, as I am a passionate C++
> developer, there are passionate assembly developers who enjoy helping
> others. I was hoping for help from someone like myself.
>
> There you have it.
>


.



Relevant Pages

  • Re: Accessing Command-line text
    ... a nice example that speed is made by the algorithm and not by the hardware. ... my routine that find factorial ) should be faster because use 1 ... my assembler doesn't support floating point instructions, ... "geeks language is better than all". ...
    (alt.lang.asm)
  • Re: Population count in SSE2, again
    ... This is a crude version of the well-known code in AMD's optimization ... and is not as fast as the AMD code. ... oneself that the algorithm is mathematically sound. ... Yes I'm sure you are right, but a typo in assembler is easily made. ...
    (comp.lang.asm.x86)
  • Re: Thinking assembly?
    ... This is one part of thinking in assembler to me. ... > algorithm and implement it in the higher-level language. ... you can't merely go back the HLL code to obtain ...
    (alt.lang.asm)
  • Re: Use of random bits
    ... The application I have in mind is the Algorithm P in Knuth's book ... items the first time through the loop, n-1 the second, etc., you could ... routine above: ... Notice how each time through the while loop, I had to use more decimal ...
    (sci.stat.math)
  • Re: Mixing C and assembly in interrupt handler (AVRGCC)
    ... Or write a small assembly routine that saves some registers, ... calls a function written in C and when it returns, the assembler ... If the interrupt service routine is coded in C, ... >code can directly reference variables defined and declared by C. ...
    (comp.arch.embedded)