Re: How to best parse a CSV data file and do a lookup in C?

From: pete (pfiland_at_mindspring.com)
Date: 11/20/04


Date: Sat, 20 Nov 2004 12:34:14 GMT

Robert Gamble wrote:
>
> On Fri, 19 Nov 2004 15:58:40 -0800, Paul Hsieh wrote:
>
> > Robert Gamble <rgamble99@gmail.com> wrote:
> >> On Thu, 18 Nov 2004 23:48:46 -0500, Robert Gamble wrote:
> >> > On Thu, 18 Nov 2004 20:23:24 -0800, Johnny Google wrote:
> >> >> The main point here is I need help with either an already created
> >> >> library which handles data files like this - or some C equivalent to
> >> >>
> >> >> @data = split /,/, $line;
> >> >>
> >> >>
> >> >> Since I think I know how to read the contents, and loop through the
> >> >> array in C - and do the strcmp to test if it matches - I just don't
> >> >> know how to simply put break the line into an array by telling it what
> >> >> the delimiter is (as in the above split example).
> >
> > Ok, Johnny Google. I've written a usable CSV parser here:
> >
> > http://www.pobox.com/~qed/bcsv.zip
> >
> > It, in fact, can parse out the complete CSV standard (which includes
> > quoting so you can put a commas, lead/trailing spaces and in fact
> > quotes themselves in any given field) and thus should easily be able
> > to handle that case you are concerned with.
> >
> >> > The only thing provided by Standard C for this purpose is strtok which
> >> > will probably work fine for your needs. Take a look at that and come back
> >> > if you have any questions about it.
> >
> > Using strtok() is rarely ever the right solution.
> > strtok() includes a
> > hidden side effect which makes using it in a reenterable way
> > impossible.
>
> It may be rarely the right solution but it will work fine here.
> I don't know what is "hidden" about the side effect as it is well
> documented and I pointed out the gotchas.

The standard doesn't say anything special about strtok
in terms of reentrancy.

       7.1.4 Use of library functions
       [#4] The functions in the standard library are not
       guaranteed to be reentrant and may modify objects with
       static storage duration.

> > You should use strcspn() instead -- it essentially provides the same
> > functionality as strtok() without its negative side effects
> > (strcspn() has no effect on reenterability).
>
> Not even close. The strcspn function can be used to write a
> strtok-like
> function but does not provide the same functionality by itself and is
> certainly not a replacement.

strchr can be used to write strcspn and strspn like functions.
A strchr like function can be written in the C language proper.
A reentrant version of a strtok like function can be written
with an extra parameter to point to the object
which takes the place of the static object in strtok.

char *str_tok_r(char *s1, const char *s2, char **s3);

-- 
pete