Re: Search a binary file for a string... again! (its to slow)
From: Nick Landsberg (hukolau_at_NOSPAM.att.net)
Date: 02/28/04
- Next message: Joona I Palaste: "Re: 0a appending w/strcpy"
- Previous message: Joona I Palaste: "Re: Press enter to continue"
- In reply to: Malcolm: "Re: Search a binary file for a string... again! (its to slow)"
- Next in thread: Barry Schwarz: "Re: Search a binary file for a string... again! (its to slow)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 28 Feb 2004 22:15:40 GMT
Malcolm wrote:
> "spike" <joan@ljungh.se> wrote in message
>
>>Im writing a program to search for a string in a binary file.
>>And it works. The problem is: It is sooo slow! how can i make it
>>faster?
>>It takes 27 seconds just to search a 5 meg file.
>>I guess it has something to do with the strequal() function...
>>
>>-------------------------------------------------------------------------
>>#include <stdio.h>
>>#define MAXLEN 50
>>#define FALSE 0
>>#define TRUE !FALSE
>>#define STRSIZE 4
Remember the above define! We'll come back to it
later.
>>
>>int strequal(char sText[],char sBuff[])
>>{
>> int iRetval = TRUE;
>> int i;
>> for(i=0;i<STRSIZE;i++)
>> {
>> if(sText[i] == sBuff[i])
>> {
>> iRetval = iRetval && TRUE;
>> }
>> else
>> {
>> iRetval = iRetval && FALSE;
>> }
>> }
>> return iRetval;
>>}
>>
>
> You can optimise this by writing it
>
> int mystrncmp(const char *s1, const char *s2, int n)
> {
> int i;
>
> for(i=0;i<n;i++)
> if(s1[i] != s2[i])
> return s2[i] - s1[i];
> return 0;
> }
> This will save some cycles. You can also replace with the stdlib function
> strncmp(), which may be even faster as coded in assembly.
>
>>int main()
>>{
>> FILE *fp;
>> char sText[STRSIZE] = "name";
Latent bug in the code here. STRSIZE was defined
as 4. "name" is a 5 character string including
the NULL byte '/0' at the end. Therefore a NULL was
written *somewhere* (implementation specific).
It is very possible (but not guaranteed) that
the compiler allocated storage for sText and
sBuff right after each other.
If that was the case, the NULL byte
was written into the first character of sBuff.
Later on, when you read data into sBuff, that
NULL byte would be overwritten with some character
from the file and sText would no longer be
a null-terminated string. This would make
for some "interesting" behavior and probably
prevent you from using any of the standard library
functions.
Take the advice of the other posters about
using standard library functions and larger buffer
sizes, too.
>> char sBuff[STRSIZE];
>> unsigned long pos;
>>
>> if((fp = fopen("demo.dem","rb")) != NULL)
>> {
>> while ((fread(&sBuff, sizeof(char), sizeof(sText), fp)) >=
>
> (sizeof(sText)))
>
> You might save a bit of time by simply calling fgetc(), which is designed to
> return single characters.
>
>> {
>>
>>
>
> Most of the time the first character in the buffer won't match the first
> character in the string. So you can avoid the overhead of the function call
> by simply rejecting it.
>
>> if(strequal(sText,sBuff))
>> {
>> printf("a match!\n");
>> }
>> fgetpos(fp, &pos);
>> pos = pos-(STRSIZE-1);
>> fsetpos(fp, &pos);
>>
>
> You are doing this on each character read. This will slow you down.
The comment was absolutely correct. (Sorry, I don't
recall who made it.) You are (conceptually) reading
4 characters, then taking 3 steps back, and reading
4 characters again, 3 of which you have already read
once. While this works, your question was about
speed, and this sequence above looks like a real
pig.
>
>> }
>> fclose(fp);
>> }
>> else
>> {
>> printf("Could not open file");
>> }
>> return 0;
>>}
>>
>
> The way to speed up is to have a more sophisticated buffer.
>
> Read characters until you come up with a match in the first position.
> Then read enough characters for the string into the buffer.
> If you have a match, you have a match.
>
> Otherwise, search the buffer for the next matching character to the first
> letter, shift left, and read more in. If the alphabet is large and the
> string quite short you should frequently flush the buffer, in which case you
> go back to reading characters one at a time.
>
> Shifting characters is expensive, but a circular buffer involves the modulus
> operation, which is also expensive. To get even more speed, you can declare
> an oversize buffer, as large as you like, and keep a travelling pointer to
> the start position. Only when you hit the end do you need to move some
> characters to the beginning and reset the start pointer. (If the buffer
> empties then of course you go back to the initial start position for free).
>
>
-- Ñ "It is impossible to make anything foolproof because fools are so ingenious" - A. Bloch
- Next message: Joona I Palaste: "Re: 0a appending w/strcpy"
- Previous message: Joona I Palaste: "Re: Press enter to continue"
- In reply to: Malcolm: "Re: Search a binary file for a string... again! (its to slow)"
- Next in thread: Barry Schwarz: "Re: Search a binary file for a string... again! (its to slow)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|