Re: Very slow



Στις 13/1/2012 12:05 πμ, ο/η Kaz Kylheku έγραψε:
On 2012-01-12, George Mpouras<nospam.gravitalsun@xxxxxxxxxxxxxxxxxx> wrote:
Create a test file with 20000000 same lines of 50 commas (it will be
1020000000 bytes)
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Now I have a perl and C program running almost the same code. Perl needs
about 6 minutes to finish while the C version finishes at 20 seconds.
The difference is huge. What can I do for a faster perl version
Following the two programs


--== C ==--


#define _GNU_SOURCE
#include<stdio.h>
#include<stdlib.h>

int main(void) {
char *line = NULL;
char *p = NULL;
int NC = 1;
int n = 0;
size_t len = 0;
ssize_t read;

Unused variable.

while (getline(&line,&len, stdin) != -1) {
n = 1;
for (p = line; *p != '\0'; p++) {
if (*p == ',') { n++; }
else if (*p == '\\') { p++; }

The buffer returned by the getline function only includes the newline character
if one occurs in the data. What if it doesn't?

Then by dumb luck could end up with a string like "...," or "...\".

Then you increment the pointer, and now it points to the null terminator.
But then for loop's increment step step will increment the pointer again,
beyond the end of the string. Then the '*p != '\0' test is applied to an
invalid pointer.

}
if (n> NC) {
NC = n;
}
}
if (line) free(line);

You don't have to test for null because free(NULL) is well-defined behavior.

This line is a waste of cycles anyway if you know that the OS cleans up the
virtual memory in one fell swoop. You're just twiddling around some pointers
inside malloc's heap, when a moment later, that entire heap is going to be
history.

Releasing all memory is good in a debug build of a program, just so show that
you /can/ do it, and to help with hunting down leaks.

printf("%d\n", NC);
return EXIT_SUCCESS;
}

There is no need to dynamically allocate buffers to hold an entire
line. Try this one:

#include<stdio.h>

/* Look Ma, no malloc, no getline, no _GNU_SOURCE.
Not even a single pointer declaration. */

int main(void)
{
int NC = 1;
int n = 1;
enum state { init, esc, done } state = init;

while (state != done) {
int ch = getchar();
switch (ch) {
case '\n': case EOF:
if (n> NC)
NC = n;
n = 1;
state = (ch == EOF) ? done : init;
break;
case '\\':
switch (state) {
case init: state = esc; break;
case esc: state = init; break;
}
break;
case ',':
if (state == init)
n++;
/* fallthrough */
default:
state = init;
break;
}
}
printf("%d\n", NC);
return 0;
}

thanks Kaz
.