Re: Efficient scather-gather-copy



In article <m3acapulyk@xxxxx> spamtrap@xxxxxxxxxx "Hynek Schlawack " writes:

"Mark Whitlock" <spamtrap@xxxxxxxxxx> writes:

I'm in the unlucky situation to convert mails from the
"\n"-lineendings to "\r\n"-ones.
You shod probably just run these through a filter program. What
OS are you using ?

It's UNIX and yet filtering is simply too slow. I'd rather use memcpy()
than external programs.

I can't see that memcpy() would buy you much here, but perhaps
I'm misunderstanding the problem.

If I'm not mistaken, I have to count the lines, alloc
sizeof(mail)+num_of_lines(mail), copy each line seperately with a space
between them to the buffer and replace the "\n" through "\r\n".
How about opening 2 files ,one for input ,one for output.
Read the input ,if it is not a newline ,write it to the output.
If it is a newline ,write return ,then newline to the output.

In order to alloc() the buffer you need, you'll have to make two
passes through the buffer: one to count the newlines and one to
do the copying. The 2nd pass might be faster if it is cached,
but a single pass will almost certainly be faster overall.

Well, I'm getting it as a buffer and anything that remotely could
involve access to file systems is out of question.

Do you know the size of the buffer? Or the size of the data
within the buffer? If so the worst case scenario would be a
buffer full of newlines and the output would have length (size*2)
so if resources permit, just alloc() a buffer of that size. The
pseudo-C code could be something like

char *xlate (char *src, size_t len)
{
char *dst = malloc (len * 2);
char *d, *end = src + len;
for (d = dst; src < end; src++)
{
if (*src == '\n')
*d++ = '\r';
*d++ = *src;
}
return realloc (dst, (d - dst));
}

[Obviously with some error checking!] See what asm the compiler
outputs and try to optimise that -- probably the loop, though
it's already pretty simple and the C compiler should nake a
pretty good job of it anyway.

Imagine dozens of mails per _second_ that want to be
processed. Probably, memcpy() might be fast enough on a decent machine,
however it would definitely be a bottleneck. So I want to make it really
as fast a possible, as there isn't unfortunately much I can optimize
algorithm-sided.

If you have access to the input stream, perhaps look at injecting
the '\r's there on-the-fly?

This should stream pretty fast.
probably missing something,

Only, that "pretty fast" ain't "fast enough" in this case. ;)

Thanks for your reply,
-hs

Pete
--
"We have not inherited the earth from our ancestors,
we have borrowed it from our descendants."

.



Relevant Pages

  • Re: Efficient scather-gather-copy
    ... You shod probably just run these through a filter program. ... It's UNIX and yet filtering is simply too slow. ... Read the input,if it is not a newline,write it to the output. ... Probably, memcpy() might be fast enough on a decent machine, ...
    (comp.lang.asm.x86)
  • Re: Purveyor CGI mailbox capacity [now very long winded]
    ... It's derived from the U*x paradigm of stream-oriented data with embedded sentinal characters (such as the newline and null characters). ... Any continuous sequence of characters not punctuated by a newline character can be considered a single line of text. ... For output to a mailbox the C-RTL sets it's internal buffer size to the mailbox capacity. ...
    (comp.os.vms)
  • Re: VC 2003, WinXP and Win2000
    ... issue does not seen to occur at all under Win2k SP4. ... I am working with the wincrypt, and allocating a buffer to hold some ... first-chance exception under WinXP. ... memcpy() line to the following: ...
    (microsoft.public.vc.language)
  • Re: streambuf, yet again
    ... >> care when overflow or xsputn is called. ... What you can do is create a real buffer (instead of using the unbuffered ... before the newline and just drop the newline. ... >> complaining that overflow wasn't being called at all, ...
    (comp.lang.cpp)
  • Re: Efficient scather-gather-copy
    ... memcpythan external programs. ... I can't see that memcpy() would buy you much here, ... passes through the buffer: one to count the newlines and one to ... char *xlate (char *src, size_t len) ...
    (comp.lang.asm.x86)