Re: Searching for byte string in a binary file.
Jens.Toerring_at_physik.fu-berlin.de
Date: 05/16/04
- Next message: pete: "Re: Just how delicate are freed pointers?"
- Previous message: August Derleth: "Re: Just how delicate are freed pointers?"
- In reply to: Walter Dnes (delete the 'z' to get my real address): "Searching for byte string in a binary file."
- Next in thread: Walter Dnes (delete the 'z' to get my real address): "Ansi-portable TRUE/FALSE definition?"
- Reply: Walter Dnes (delete the 'z' to get my real address): "Ansi-portable TRUE/FALSE definition?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 16 May 2004 11:39:57 GMT
"Walter Dnes (delete the 'z' to get my real address)" <wzaltdnes@waltdnes.org> wrote:
> I took a C course some time ago, but I'm only now beginning to use it,
> for a personal pet project. My current stumbling-block is finding an
> efficient way to find a match between the beginning of a "counted"
> string and data in a binary file. Given...
> #include <stdio.h>
Also include <stdlib.h>, see below.
> int main(int argc, char *argv[])
> {
> char bstring[255];
> int bstring_length, match_length;
> long match_location;
> FILE *input_file, *output_file;
> if(argc != 3){
> printf(" Correct usage:\n %s input_filename output_filename\n", argv[0]);
> return 1;
Make that
return EXIT_FAILURE;
not all OSes return 1 for failure and 0 for success, and with the macros
EXIT_FAILURE and EXIT_SUCCESS, defined in <stdlib.h>, you're always on
the safe side.
Another common convention is to print error messages to stderr and not
mix them with the normal output of your program. So better use
fprintf( stderr, "Correct usage:\n %s input_filename output_filename\n",
argv[0]);
> }
> if((input_file = fopen(argv[1], "rb")) == NULL){
> printf("Error opening %s for input\n", argv[1]);
> return 1;
> }
> if((output_file = fopen(argv[2], "wb")) == NULL){
> printf("Error opening %s for output\n", argv[2]);
> return 1;
> }
> ...
> Later on, bstring and bstring_length get initialized. bstring may
> contain \0's, so the "logical length" (which will not exceed 255) is
> actually stored in bstring_length. What I'm trying to do is to find the
> location and length of the longest match from the left side of bstring.
> If that's not clear, here's an example...
> bstring[0] = 'a';
> bstring[1] = '\0';
> bstring[2] = 'b';
> bstring[3] = 'c';
> bstring[4] = 'd';
> bstring_length = 5;
> Assume that the sequence is not matched anywhere in the file. However,
> assume that the sequence 'a', '\0', 'b', 'c' does exist in the file.
> What's the most efficient search that returns match_length ( 4 ) and
> ftell() of the beginning of the match? If it's a function, should I
> declare in main() like so?
> int bstring_length, *match_length;
> long *match_location;
> I assume that the function prototype would be something like...
> int findmatch( bstring[255], bstring_length, match_length, match_location);
Sorry, that's not a prototype.
> By passing pointers, I assume that both match_location and
> match_length will be modified and available to main(). Or is my
> understanding of pointers lacking. findmatch() should return 0 if a
> match is found, and 1 if no match.
When you want to set 'match_location' within find_match() to a long
value that's visible in main() you need to change that a bit. You
must define 'match_location' in main() as long, i.e. have there
long match_location;
and then pass a pointer to that variable to the function. So your
prototype for the function would look like
int find_match( char buf[ 255 ], int blen, int mlen, long *loc );
(I intentionally changed the names to be used within the function
from the ones you defined in main() to make it more obvious what
belongs to main() and what belongs to find_match().) If you don't
intend to change the char array within the function it probably
would be better to make the first argument "const char buf[ 255 ]".
You now would call that function like
find_match( bstring, bstring_length, match_len, &match_location );
(please note the '&' in front of 'match_location', it tells that
you pass the address of 'match_location' and not its value).
Within find_match() you would then assign the position of the match
to '*loc', i.e. you write it into the memory location 'loc' points
to.
By the way, using names like 'bstring' and 'bstring_length' is a bit
misleading since 'bstring' isn't a string (a real string can't have
embedded '\0' characters), it's just an array of chars. So it might
be prudent to give it a name that doesn't make people assume that
it's going to be a string.
> Would memory-mapped files help here?
The rest of your questions on how to do the matching in the fastest,
most effective way are not really C questions but more about what
kind of algorithm to use. That's better discussed in groups like
comp.programming since it really doesn't is is about C. And a google
search for e.g. "Boyer-Moore algorithm" (just to name one) will
probably come up with a lot of interesting links (and show you
how many possible algorithms have been developed for that problem
over the years. E.g.
http://www-igm.univ-mlv.fr/~lecroq/string/
looks rather interesting). Finally, questions about memory-mapping
files are off-topic here because that can only be done with system-
specific extensions to C.
Regards, Jens
-- \ Jens Thoms Toerring ___ Jens.Toerring@physik.fu-berlin.de \__________________________ http://www.toerring.de
- Next message: pete: "Re: Just how delicate are freed pointers?"
- Previous message: August Derleth: "Re: Just how delicate are freed pointers?"
- In reply to: Walter Dnes (delete the 'z' to get my real address): "Searching for byte string in a binary file."
- Next in thread: Walter Dnes (delete the 'z' to get my real address): "Ansi-portable TRUE/FALSE definition?"
- Reply: Walter Dnes (delete the 'z' to get my real address): "Ansi-portable TRUE/FALSE definition?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|