Re: Newbie C question - random access.



santosh wrote:
Grinner wrote:
"Grinner" <grinner@xxxxxxxxxxx> wrote in message

In C is there a function which will allow you to quickly locate
a search value and not have to sequentually read a file or do
you have to code your own search given that files are treated
as sequential requiring code to binary divide the file until
the key value is in range.?

I should preface that with . . Assuming the file is sorted.

What is "sorted" within the file? If the file is binary, the
bsearch function may do what you want. On the other hand, if it's
text, you can use the strstr function to locate your string.

In any case you'll have to read the file into memory, (or atleast
a part of it), if you want the search to be fast. C's streams do
some internal buffering, but it may not be good enough. Using the
operating system's memory mapping facility, if present, may also
be worthwhile. It really depends on the file type, it's internal
format, it's size, nature of typical search keys, how often
you'll have to search etc.

I've published this before on c.l.c.

/*
Leor Zolman wrote:
On 25 Feb 2004 07:34:40 -0800, joan@xxxxxxxxx (spike) wrote:

Im trying to write a program that should read through a binary
file searching for the character sequence "\name\"

Then it should read the characters following the "\name\"
sequence until a NULL character is encountered.

But when my program runs it gets a SIGSEGV (Segmentation
vioalation) signal.

Whats wrong? And is there a better way than mine to solve
this task (most likely)

I think so. Here's a version I just threw together:
*/

/* And heres another throw -- binfsrch.c by CBF */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <assert.h>

/* The difference between a binary and a text file, on read,
is the conversion of end-of-line delimiters. What those
delimiters are does not affect the action. In some cases
the presence of 0x1a EOF markers (MsDos) does.

This is a version of Knuth-Morris-Pratt algorithm. The
point of using this is to avoid any backtracking in file
reading, and thus avoiding any use of buffer arrays.
*/

size_t chrcount; /* debuggery, count of input chars, zeroed */

/* --------------------- */

/* Almost straight out of Sedgewick */
/* The next array indicates what index in id should next be
compared to the current char. Once the (lgh - 1)th char
has been successfully compared, the id has been found.
The array is formed by comparing id to itself. */
void initnext(int *next, const char *id, int lgh)
{
int i, j;

assert(lgh > 0);
next[0] = -1; i = 0; j = -1;
while (i < lgh) {
while ((j >= 0) && (id[i] != id[j])) j = next[j];
i++; j++;
next[i] = j;
}
#if (0)
for (i = 0; i < lgh; i++)
printf("id[%d] = '%c' next[%d] = %d\n",
i, id[i], i, next[i]);
#endif
} /* initnext */

/* --------------------- */

/* reads f without rewinding until either EOF or *marker
has been found. Returns EOF if not found. At exit the
last matching char has been read, and no further. */
int kmpffind(const char *marker, int lgh, int *next, FILE *f)
{
int j; /* char position in marker to check */
int ch; /* current char */

assert(lgh > 0);
j = 0;
while ((j < lgh) && (EOF != (ch = getc(f)))) {
chrcount++;
while ((j >= 0) && (ch != marker[j])) j = next[j];
j++;
}
return ch;
} /* kmpffind */

/* --------------------- */

/* Find marker in f, display following printing chars
up to some non printing character or EOF */
int binfsrch(const char *marker, FILE *f)
{
int *next;
int lgh;
int ch;
int items; /* count of markers found */

lgh = strlen(marker);
if (!(next = malloc(lgh * sizeof *next))) {
puts("No memory");
exit(EXIT_FAILURE);
}
else {
initnext(next, marker, lgh);
items = 0;
while (EOF != kmpffind(marker, lgh, next, f)) {
/* found, take appropriate action */
items++;
printf("%d %s : \"", items, marker);
while (isprint(ch = getc(f))) {
chrcount++;
putchar(ch);
}
puts("\"");
if (EOF == ch) break;
else chrcount++;
}
free(next);
return items;
}
} /* binfsrch */

/* --------------------- */

int main(int argc, char **argv)
{
FILE *f;

f = stdin;
if (3 == argc) {
if (!(f = fopen(argv[2], "rb"))) {
printf("Can't open %s\n", argv[2]);
exit(EXIT_FAILURE);
}
argc--;
}
if (2 != argc) {
puts("Usage: binfsrch name [binaryfile]");
puts(" (file defaults to stdin text mode)");
}
else if (binfsrch(argv[1], f)) {
printf("\"%s\" : found\n", argv[1]);
}
else printf("\"%s\" : not found\n", argv[1]);
printf("%lu chars\n", (unsigned long)chrcount);
return 0;
} /* main binfsrch */

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews


.



Relevant Pages

  • Re: creating a pointer to a imbeded struct causes C2065
    ... The whole sequence of code below is so deeply suspect, ... What is the purpose of a 'char' array of fixed size? ... typedef struct T_Smoo ...
    (microsoft.public.vc.mfc)
  • Re: Evaluation of C program
    ... because the 10th character in the trial substring does not match ... compared to the current char. ... Once the (lgh - 1)th char ... void initnext(int *next, const char *id, int lgh) ...
    (comp.lang.c)
  • Re: help with a jpg-file needed
    ... compared to the current char. ... Once the (lgh - 1)th char ... void initnext(int *next, const char *id, int lgh) ... int kmpffind(const char *marker, int lgh, int *next, FILE *f) ...
    (comp.os.linux.misc)
  • Re: Write to file
    ... You can read the file one character at a time. ... compared to the current char. ... Once the (lgh - 1)th char ... void initnext(int *next, const char *id, int lgh) ...
    (comp.lang.c)
  • Re: Algorithm Question
    ... compared to the current char. ... Once the (lgh - 1)th char ... void initnext(int *next, const char *id, int lgh) ... int kmpffind(const char *marker, int lgh, int *next, FILE *f) ...
    (comp.programming)