Re: Efficient scather-gather-copy
- From: spamtrap@xxxxxxxxxx
- Date: Fri, 14 Apr 2006 06:09:26 +0000 (UTC)
In article <m3acapulyk@xxxxx> spamtrap@xxxxxxxxxx "Hynek Schlawack " writes:
"Mark Whitlock" <spamtrap@xxxxxxxxxx> writes:
I'm in the unlucky situation to convert mails from theYou shod probably just run these through a filter program. What
"\n"-lineendings to "\r\n"-ones.
OS are you using ?
It's UNIX and yet filtering is simply too slow. I'd rather use memcpy()
than external programs.
I can't see that memcpy() would buy you much here, but perhaps
I'm misunderstanding the problem.
If I'm not mistaken, I have to count the lines, allocHow about opening 2 files ,one for input ,one for output.
sizeof(mail)+num_of_lines(mail), copy each line seperately with a space
between them to the buffer and replace the "\n" through "\r\n".
Read the input ,if it is not a newline ,write it to the output.
If it is a newline ,write return ,then newline to the output.
In order to alloc() the buffer you need, you'll have to make two
passes through the buffer: one to count the newlines and one to
do the copying. The 2nd pass might be faster if it is cached,
but a single pass will almost certainly be faster overall.
Well, I'm getting it as a buffer and anything that remotely could
involve access to file systems is out of question.
Do you know the size of the buffer? Or the size of the data
within the buffer? If so the worst case scenario would be a
buffer full of newlines and the output would have length (size*2)
so if resources permit, just alloc() a buffer of that size. The
pseudo-C code could be something like
char *xlate (char *src, size_t len)
{
char *dst = malloc (len * 2);
char *d, *end = src + len;
for (d = dst; src < end; src++)
{
if (*src == '\n')
*d++ = '\r';
*d++ = *src;
}
return realloc (dst, (d - dst));
}
[Obviously with some error checking!] See what asm the compiler
outputs and try to optimise that -- probably the loop, though
it's already pretty simple and the C compiler should nake a
pretty good job of it anyway.
Imagine dozens of mails per _second_ that want to be
processed. Probably, memcpy() might be fast enough on a decent machine,
however it would definitely be a bottleneck. So I want to make it really
as fast a possible, as there isn't unfortunately much I can optimize
algorithm-sided.
If you have access to the input stream, perhaps look at injecting
the '\r's there on-the-fly?
This should stream pretty fast.
probably missing something,
Only, that "pretty fast" ain't "fast enough" in this case. ;)
Thanks for your reply,
-hs
Pete
--
"We have not inherited the earth from our ancestors,
we have borrowed it from our descendants."
.
- Follow-Ups:
- Re: Efficient scather-gather-copy
- From: Hynek Schlawack
- Re: Efficient scather-gather-copy
- References:
- Efficient scather-gather-copy
- From: Hynek Schlawack
- Re: Efficient scather-gather-copy
- From: Hynek Schlawack
- Efficient scather-gather-copy
- Prev by Date: Re: Why can OS kernel only use maximum 2GB memory?
- Next by Date: Re: Why can OS kernel only use maximum 2GB memory?
- Previous by thread: Re: Efficient scather-gather-copy
- Next by thread: Re: Efficient scather-gather-copy
- Index(es):
Relevant Pages
|