Re: Making HLA even more high-level.



/*
This is what I call the "Software Tools" myth. I've called it this
because of the claim in the (otherwise fine) book "Software Tools" by
Kernighan and Plager that there is no need to optimize a getc function
because the underlying I/O operations overwhelm everything else.
*/

It's not a myth as much as poor choise of words. getc() is so shitty
interface that no matter how you slice and dice the implementation it
won't get much better. A function call to get a single character. How
efficient. ON top of that, if there is some caching going on, that will
still add compare-branch overhead to detect if there still is data in
the cache. No matter how you look at it, it's plain ***.

Here's how I implement streaming, say, I am implementing a reader for
some fileformat, does't matter which. We have a header in the
fileformat, the Plain Wrong way that is sometimes seen in sourcecode is
this:

struct header
{
uint32 bar;
// .. stuff ...
};

header bla;
fread(&bla,sizeof(header),1);

Problem with this approach is that it doesn't cater endianess. Nor it
caters padding, if the struct is "incorrectly" writen and sometimes it
isn't even possible due to layout of the data in the header. But that
kind of crap is written all the time.

A step forward is the usual:

header bla;
bla.bar = (buffer[0] << 24) | (buffer[1] << 16) | ...; buffer += 4;

... where buffer is where some parts of the file are read, say, as many
bytes as is the header size. This can be encapsulated:

bla.bar = read_uint32(buffer);

Endianess is handled by this because << is endianess agnostic. This
approach can be generalized into a generic *component*, and it can play
ball with the streams if the stream has a read method which returns
pointer to internal (read-only) buffer:

const char* buffer = stream->read(size);
This buffer is assigned to read proxy template:

inputfilter<little_endian> io = buffer;

Or,

typedef inputfilter<little_endian> xfilter;

xfilter xf = stream->read(size);

Now we are ready to extract data.

uint32 value = xf.read<uint32>();

That' it. This is 100% inlined method call, the mechanism that
implements this interface is using meta programming techniques so code
generation is delayed until invocation of the template. WIth Visual C++
2005 the code is generated at *linking* time if that option is switched
on (which it is, by default).

The assembly that results for this is "mov" instruction. However, if a
lot of data is read and there is no endianess conversion being invoked
I have seen the code convolute into a block memory copy, which is very
close to optimal (the optimal would be the first super-naive approach
which malfunctions on many occasions, except in this case it would be
invoked after static analysis by the compiler :) Analysis is a little
bit too strong a word, rather, code generator noticing a pattern. :)

Here's a cleaned up example without the explanation garbage:

void example(header& head, stream* s)
{
xfilter xf = s->read(sizeof(header));
head.bar = xf.read<uint32>();
// ...
}

This isn't assembler, true, but I don't use assembler for *everything*,
this is a case where I wouldn't use assembler because for me using HLL
makes more sense in this kind of jobs. The interface is designed while
thinking in "close to metal" terms, in otherwords, in assembly. The
overhead of stack based function calls is eliminated, the overhead of
endianess conversion decision at runtime is eliminated. The overhead of
the whole conversion is eliminated (notice not << based data
extraction) when there is no conversion at runtime (still I emphasize
that the decision is done at compilation time :)

Clearly a situaton where assembly knowledge is a big helper, because IO
is fundametal building block of any application. But it doesn't mean
that you have to write the code IN assembly. :)

This is ofcourse slightly off-topic, but couldn't resist after that
getc() example! So basicly, getc() isn't worth optimizing as it's
fundametally flawed design in the first place -- but -- just maybe --
it was never designed or meant to be a blazing fast? That's a question
for which I don't know nor care to know the answer. :)

.


Loading