Re: How to best parse a CSV data file and do a lookup in C?

Jens.Toerring_at_physik.fu-berlin.de
Date: 11/19/04


Date: 19 Nov 2004 12:00:41 GMT

Johnny Google <john_pataki@yahoo.com> wrote:
> Here is an example of the type of data from a file I will have:

> Apple,4322,3435,4653,6543,4652
> Banana,6934,5423,6753,6531
> Carrot,3454,4534,3434,1111,9120,5453
> Cheese,4411,5522,6622,6641

> The first position is the info (the product) I want to retreive for the
> corresponding code. Assuming that the codes are unique for each product
> and all code data is on one line.

> So - I know the code is '9120' and I want to read the file line by line
> and build an array for each line seperating on the commas.

> Is there simple way to do this using a library or something already
> created for reading simple data files?

> Trying to convert from my perl code to C ...

> In perl this would be about 10 or so lines...

> $prod_code = '9120';

> find_product($prod_code);

> sub find_product {

> $find_code = shift;
> foreach $line (read_file($datafile)) {
> @data = split /,/,$line; # split the line into an array
> foreach $code (@data) {
> # find a match from the list of elements
> if ($code == $find_code) {
> return $data[0]; # return the first element in the list
> } # end if
> } # end foreach
> } # end foreach

> } # end sub

Here's a program that should do the trick (well, modulo reading a
whole line and building an array of product IDs, but that's not
what seems to be needed when I go by your Perl code). One restri-
ction is that the maximum length of the items you try to read is
known in advance (to be adjusted via the MAX_FIELD_WIDTH macro).
Second restriction is that it won't work correctly in all cases
if the last line doesn't end with a newline. Third restriction is
that no additional whitespace is supposed to appear anywhere in
the input file.

-----------8<----------------------------------------

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_FIELD_WIDTH 128

int main( int argc, char *argv[ ] )
{
    FILE *fp;
    char prod_info[ MAX_FIELD_WIDTH + 1 ];
    char buf[ MAX_FIELD_WIDTH + 1 ];
    char fmt[ 30 ];
    char c;
    int prod_info_comes_next = 1;

    if ( argc < 3 )
    {
        fprintf( stderr, "Missing argument(s), need file name "
                 "and product ID\n" );
        return EXIT_FAILURE;
    }

    if ( ( fp = fopen( argv[ 1 ], "r" ) ) == NULL )
    {
        fprintf( stderr, "Can't open file '%s'\n", argv[ 1 ] );
        return EXIT_FAILURE;
    }

    sprintf( fmt, "%%%d[^,\n]%%c", MAX_FIELD_WIDTH );

    while ( fscanf( fp, fmt, buf, &c ) == 2 )
    {
        if ( c != '\n' && c != ',' )
            break;

        if ( prod_info_comes_next )
            strcpy( prod_info, buf );
        else
            if ( ! strcmp( buf, argv[ 2 ] ) )
            {
                printf( "Found product: %s\n", prod_info );
                return EXIT_SUCCESS;
            }

        prod_info_comes_next = c == '\n' ? 1 : 0;
    }

    if ( ! feof( fp ) )
        fprintf( stderr, "Invalid input or read error\n" );
    else
        fprintf( stderr, "Product ID %s not found\n", argv[ 2 ] );

    fclose( fp );
    return EXIT_FAILURE;
}

-----------8<----------------------------------------

If you want to change that into a function to be used in a larger
program and you want to return the 'prod_info' array make sure
that you either change its type to "static char" (otherwise the
memory for it will vanish the moment you leave the function) or
make it a char pointer and allocate memory for it in the function
(but which then the calling function has to deallocate).

The only slightly tricky bit is the format string for fscanf().
With a value of 128 for MAX_FIELD_WIDTH it will be "%128[^,\n]%c",
making fscanf() read a maximum number of 128 characters as long
as they are neither a comma nor a '\n', plus another character
that must be a comma or a newline.

                                  Regards, Jens

-- 
  \   Jens Thoms Toerring  ___  Jens.Toerring@physik.fu-berlin.de
   \__________________________  http://www.toerring.de


Relevant Pages

  • Re: Sorry, newbie question about generating a random string
    ... string grows to a max of 10 characters. ... The real problem is that you are not terminating the string. ... string is an array of characters ending in a null character, ... char myChar; ...
    (comp.lang.c.moderated)
  • Re: regex
    ... I suppose pedantry has its place. ... I was simply talking about looping through the characters in the ... array of char. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: How to zero-initialize a C string (array of wchar_t)?
    ... you need to fill the array with '\0' because if the array is of dimension ... because the 5-6-7-8 characters were not set and are still using unset ... is almost impossible to overflow... ... if you allocated 10 char and the user input ...
    (comp.lang.cpp)
  • Re: difference for array in c and c++?
    ... functions take an 8-byte char array to indicate which module you ... remainder of the array with null characters is wasteful. ... The strncpy function copies not more than n characters ...
    (comp.lang.c)
  • Re: Using search range in InStr
    ... I need to find a string of data using a range of characters. ... What I need to find is the position of the first instance of a comma, ... use the split function to create an array of all text items inbetween ... Dim a As Variant ...
    (microsoft.public.access.modulesdaovba)