Re: How to best parse a CSV data file and do a lookup in C?

From: Robert Gamble (rgamble99_at_gmail.com)
Date: 11/20/04


Date: Fri, 19 Nov 2004 19:34:22 -0500

On Fri, 19 Nov 2004 15:58:40 -0800, Paul Hsieh wrote:

> Robert Gamble <rgamble99@gmail.com> wrote:
>> On Thu, 18 Nov 2004 23:48:46 -0500, Robert Gamble wrote:
>> > On Thu, 18 Nov 2004 20:23:24 -0800, Johnny Google wrote:
>> >> The main point here is I need help with either an already created
>> >> library which handles data files like this - or some C equivalent to
>> >>
>> >> @data = split /,/, $line;
>> >>
>> >>
>> >> Since I think I know how to read the contents, and loop through the
>> >> array in C - and do the strcmp to test if it matches - I just don't
>> >> know how to simply put break the line into an array by telling it what
>> >> the delimiter is (as in the above split example).
>
> Ok, Johnny Google. I've written a usable CSV parser here:
>
> http://www.pobox.com/~qed/bcsv.zip
>
> It, in fact, can parse out the complete CSV standard (which includes
> quoting so you can put a commas, lead/trailing spaces and in fact
> quotes themselves in any given field) and thus should easily be able
> to handle that case you are concerned with.
>
>> > The only thing provided by Standard C for this purpose is strtok which
>> > will probably work fine for your needs. Take a look at that and come back
>> > if you have any questions about it.
>
> Using strtok() is rarely ever the right solution. strtok() includes a
> hidden side effect which makes using it in a reenterable way
> impossible. Correct CSV parsing, in fact, does *NOT* reduce to a
> simple line-by-line, then strtok with a "," seperator (though it may
> match the sample data given by the OP.)

It may be rarely the right solution but it will work fine here. I don't
know what is "hidden" about the side effect as it is well documented and I
pointed out the gotchas.

True CSV parsing is admittedly more complicated but based on the sample
data and the code provided by the OP, strtok is a fine solution.

> You should use strcspn() instead -- it essentially provides the same
> functionality as strtok() without its negative side effects (strcspn()
> has no effect on reenterability).

Not even close. The strcspn function can be used to write a strtok-like
function but does not provide the same functionality by itself and is
certainly not a replacement.

Rob Gamble



Relevant Pages

  • Re: How to best parse a CSV data file and do a lookup in C?
    ... It, in fact, can parse out the complete CSV standard (which includes ... Using strtok() is rarely ever the right solution. ... You should use strcspn() instead -- it essentially provides the same ... has no effect on reenterability). ...
    (comp.lang.c)
  • cannot read after using strtok()
    ... In the CSV ... the first line is filled with header info that I don't want to ... information using the first strtok() command. ... Where is the pointer pointing to that ...
    (comp.lang.c)
  • Re: How to tokenize string without using strtok
    ... I heard that strtok is not thread safe. ... functionality if you try to tokenize across multiple threads (which is ... outer loop, and for words in the inner loop. ... sample program which will tokenize string without using strtok. ...
    (comp.lang.c)
  • Re: cannot read after using strtok()
    ... In the CSV ... the first line is filled with header info that I don't want to ... store in my array. ... information using the first strtok() command. ...
    (comp.lang.c)
  • RE: CSV Help
    ... Can someone please let me know if the built-in ... functionality of Perl can print OUT to CSV but control what the CSV ...
    (perl.beginners)