Re: improve strlen
- From: "hutch--" <spamtrap@xxxxxxxxxx>
- Date: Fri, 28 Oct 2005 02:58:39 +0000 (UTC)
jukka,
I think maybe you have missed the direction I have commented in. I
certainly see writing multiport code as a worthwhile endeavour but I
don't see it as a replacement for hardware specific code where
assembler has no peers in terms of size and speed.
The original post in this topic was a member looking for the difference
between C code and an assembler version of a byte scanner for
determining the length of a zero terminated string. I posted for him a
slightly modified algorithm that was written by Agner Fog in about 1996
and in code terms, that is a reasonably long life for an algo design in
assembler.
The arguments against writing assembler are usually the development
cycle time yet to produce a nearly as fast version in C++, I sugest
that you have probably spent more time than it would take to write it
in assembler and while the development may be useful to you in terms of
portability, it is neither a development time or speed advantage.
> > You remind me in the young and wild days..
In my youth I wrote ANSI C but time and cynicism lead me down the road
of writing pure assembler in many places because portability in almost
every instance is a myth. My main use for a C compiler these days is
ratting through the mountain of old C junk for decent algorithm designs
which CL.EXE easily converts to MASM format assembler which is then a
good target for manual optimisation.
I am also not without criticism of current C compiler design in terms
of code generation. RISC theory code design may be convenient for
compiler designers but current x86 hardware is very badly suited for
such theory with its restricted range of general purpose registers and
you regularly see redundant loads and stores so that trivial API calls
and the like are performed in registers.
This is left over 1990s technology when earlier than PII hardware was
faster that way. Then there is the problem of using the same
optimisation strategy for all code in a module and while you can
seperate the fast code from the hack OS code, few would bother to do
this and even fewer would know what code matters and what does not.
> > Wait a second champ, I never said that, I am advocating the thought
> > that resorting to assembly as First thing is folly. Resorting to it
> > when there is need isn't.
The problem with this view is that it escalates upwards in the same
manner. Many VB programmers would use the same argument against C++
where you only need to write "low level" code on a needs basis so you
don't natively write in C++.
There is in fact an ever growing number of people who do use assembler
as a first choice for some tasks and it is purely a matter of
familiarity with the language format. Many with a high level background
don't properly understand that assembler can routinely work with the
12000 plus API calls, the near massive collection of compatible C
libraries, libraries written in assembler and so on.
Assembler programming is by no means restricted to plugging up the
defects in compiler code output but much more to do with freedom of
design and architecture as well as chasing speed where it matters.
Instruction choice is a matter of targetted market width. If the Linux
desktop market is 2%, gaming is 0.01% of the sum total market and it
makes high demands on video, meory and processor performance, all of
which change on a weekly basis to a later faster and more expensive
choice.
I have always been stuck with targetting code at the widest number of
people and this means the furthest backwards compatibility for the
current Window OS platform. This says primarily 486 code but there is
more to it than just liear backwards compatibility. MMX was a big deal
with a P200 mmx processor but it does not perform reliably across all
of the later processors. It was also cursed with sharing te FP
registers which excluded joint FP, MMX operations without a massively
expensive time delay.
SSE hit the deck with a PIII and was occasionally faster that MMX but
as usual the limiting factor is memory bandwidth. The gain with SSE(2)
is the non temporal writes where you can clock the speed difference in
real time.
I also see this as the saving factor with compiler generated code that
memory bandwidth compresses the diference between shorter code with
less instructions as against a mountain of redundant loads and stores.
Put simply processor is still some powers faster than curent DDR 400
and later memory and tis allows a reasonably large number of redundant
instructions to be placed between memory access instructions.
I have an example in mind clocking an insertion sort where the removal
of 33 redundant loads and stores made no difference in the time of the
algo. he only thing that did make it faster was reducing the number of
memory accesses.
> > Ofcourse. But only when it pays off in some ways, makes a difference to
> > real-world software. strlen() is a good example, where it doesn't make
> > a didly-doo's difference to real programs performance in most of the
> > real-world, production software.
A single string length algo is only a very small component of common
tasks yet if the same indiference is applied to the sum total of
software design, you end up with the slow bloated style of C++ that is
common these days in commercial applications and hardware is not
getting faster but software is still getting bigger and slower.
Its not what "CAN" be done but what "DOES" get done with the majority
of software production tools that is the measure of the tool. No doubt
a well written C library will easily produce good quality final
application code in many instances but a vast majority of modern
applications are not in this class.
The hallmark of modern application production is massive size
increases, reduced functionality, inappropriately used threaded code
with endless timing lags and very high demands on current hardware.
> > I only took it into myself to write
> > the C++ version to validate that my theory, which is forged by years of
> > practise, is still correct. It still is.
This is fine and I hope it was useful to you but with the development
time to produce a C++ version that is nearly as fast as an old
assembler version, development time goes for the assembler code, not
the C++ code.
> > That depends who you writing software for. If you write it for
> > yourself, okay. If it is freeware, open source.. who cares where it
> > runs besides the author, or those who contribute. If it is application
> > written for a customer, usually they dictate the terms.
The project that I maintain is used by a very large number of people
and it must remain useful to this number of people so there is no real
point in tergetting the 0.01% doing unusual things. People who need
code in this range have the perfect tool with an assembler to pick the
advantages they require and simply write what they need.
> > If your point is to make fastest possible code, you take the pains to
> > write the code in assembly, but then ignore latest instructions
> > possible on x86 platform what's wrong with the picture?
The problem with this comment is it assumes another language primacy
yet there are enough people who write simple things in assembler
without feeding it through the restrictions of Delphi or C++ or
whatever else. Apart from speed issues, near complete freedom in terms
of architecture has a lot going for it and in the case of MASM, its
pre-processor will eat C compilers alive in terms of capacity.
Being able to design your own language free from the claptrap is one of
the large advantages in assembler programming.
> > I don't know why you still want to support 386 while writing *windows*
> > software in 2005.
Very simple actually, the vast majority of computer around the world
are not high end dual core AMD 64 Opterons with > 8 gig of memory but
far more humble machines that profit from small fast software written
in assembler where the later slow bloated hardware specific stuff just
won't run on such boxes.
Really high end graphics run on SGI boxes and when you don't need to
target a wide range of people, tis will deliver performance that the PC
market is some power slower than.
Regards,
hutch at movsd dot com
.
- Follow-Ups:
- Re: improve strlen
- From: jukka@xxxxxxxxxxxx
- Re: improve strlen
- References:
- improve strlen
- From: Claudio Daffra
- Re: improve strlen
- From: spamtrap
- Re: improve strlen
- From: hutch--
- Re: improve strlen
- From: spamtrap
- Re: improve strlen
- From: jukka@xxxxxxxxxxxx
- Re: improve strlen
- From: jukka@xxxxxxxxxxxx
- Re: improve strlen
- From: jukka@xxxxxxxxxxxx
- Re: improve strlen
- From: hutch--
- Re: improve strlen
- From: jukka@xxxxxxxxxxxx
- Re: improve strlen
- From: randyhyde@xxxxxxxxxxxxx
- Re: improve strlen
- From: hutch--
- Re: improve strlen
- From: jukka@xxxxxxxxxxxx
- Re: improve strlen
- From: hutch--
- Re: improve strlen
- From: jukka@xxxxxxxxxxxx
- Re: improve strlen
- From: randyhyde@xxxxxxxxxxxxx
- improve strlen
- Prev by Date: Re: ESP (stack) question (using HLA)???
- Next by Date: Re: ESP (stack) question (using HLA)???
- Previous by thread: Re: improve strlen
- Next by thread: Re: improve strlen
- Index(es):
Relevant Pages
|