Re: searching data for a large set of substrings

From: pete (pfiland_at_mindspring.com)
Date: 10/03/04


Date: Sun, 03 Oct 2004 13:25:47 GMT

C3 wrote:
>
> I have to process some data in C
> that is given to me as a char * array. I
> have a fairly large number of substrings (well, they're not actually
> printable, but let's treat them as strings)
> that I have to search for in the data.
>
> I need to keep a count of how often each of these
> substrings occurs in my original data and
> then print it out at the end.
>
> This is a fairly mundane task,
> but since I have so many substrings, it's a
> pain having to #define them all.
> Can anybody suggest an efficient way of
> doing this task?
>
> BTW, I have to use C. If I could use Perl, I'd be in heaven.
 

/* BEGIN new.c output */

The substring "string", occurs 4 times.
The substring "C", occurs 3 times.
The substring "kibo", occurs 0 times.
The substring "hen", occurs 1 times.
The substring "of", occurs 5 times.

/* END new.c output */

/* BEGIN new.c */

#include <stdio.h>
#include <string.h>

int main(void)
{
    struct substring {
        char *string;
        long unsigned count;
    } substring[] = {{"string"},{"C"},{"kibo"},{"hen"},{"of"}},
        *str_ptr;
    char *data[] = {
        "I have to process some data in C that is given to me "
        "as a char * array.\n I have a fairly large number of "
        "substrings\n(well, they're not actually printable, "
        "but let's treat them as strings)\nthat I have to search "
        "for in the data.",
        "I need to keep a count of how often each of these "
        "substrings occurs in my original data and then print it "
        "out at the end.",
        "This is a fairly mundane task, but since I have so many "
        "substrings, it's a pain having to #define them all.\n"
        "Can anybody suggest an efficient way of doing this task?\n",
        "BTW, I have to use C.\n"
        "If I could use Perl, I'd be in heaven.\ncheers,"
    };
    size_t nstring, ndata;
    char **data_ptr, *ptr;
    long unsigned count;
    
    nstring = sizeof substring / sizeof *substring;
    for (str_ptr = substring; nstring-- != 0; ++str_ptr) {
        count = 0;
        data_ptr = data;
        ndata = sizeof data / sizeof *data;
        while (ndata-- != 0) {
            ptr = *data_ptr;
            ptr = strstr(*data_ptr, str_ptr -> string);
            while (ptr != NULL) {
                ++count;
                ptr = strstr(ptr + 1, str_ptr -> string);
            }
            ++data_ptr;
        }
        str_ptr -> count = count;
    }
    puts("/* BEGIN new.c output */\n");
    nstring = sizeof substring / sizeof *substring;
    for (str_ptr = substring; nstring-- != 0; ++str_ptr) {
        printf("The substring \"%s\", occurs %lu times.\n",
            str_ptr -> string, str_ptr -> count);
    }
    puts("\n/* END new.c output */");
    return 0;
}

/* END new.c */

-- 
pete


Relevant Pages

  • Re: Check for Common character sequence ( I will pay)?
    ... Write a function in C# that takes in an array of ASCII strings and finds ... common character sequences. ... more adjacent characters that appear in more than one string in the array. ... //count of times this substring ...
    (microsoft.public.dotnet.framework)
  • Re: How to return a variable length substring from a function ?
    ... I have a function that returns a variable length character substring. ... Depending on the input, the output is either 1 to 3 characters long. ... strings within fixed length strings. ...
    (comp.lang.fortran)
  • Re: FindFirstFile, how much faster than FindNextFile?
    ... >substring in each of about 50 filenames, ... You have not specified how the 50 strings are specified. ... A number like 50 is so tiny that pretty much any in-memory algorithm that scans the ...
    (microsoft.public.vc.mfc)
  • Re: Searching substrings in records.
    ... fields, like a database. ... Some of these fields are strings (or char ... every record and perform a substring search on any of them. ... characters and, if you know the language, if characters with a ...
    (comp.programming)
  • Re: Comparing Similar Strings
    ... that is the #sameAs any substring of stringTwo, ... but might suggest checking 'The Algorithm ... It discusses and input of a set, S, of strings S1...Sn. ... Right now this method resides in a class I made, ...
    (comp.lang.smalltalk)

Loading