Re: reading csv files

From: Anthony Borla (ajborla_at_bigpond.com)
Date: 01/03/04


Date: Sat, 03 Jan 2004 13:20:37 GMT


"less sexy" <less@sexy.com> wrote in message
news:WdKdnfY5IpVfomuiRVn-gg@comcast.com...
>
> "Anthony Borla" <ajborla@bigpond.com> wrote in message
> news:azeJb.74531$aT.2251@news-server.bigpond.net.au...
> >
> >
> [snip]
>
> > Note though the reason your code is not working is
> > because the ',' character is not being extracted from the
> > stream, so, after the first name is extracted, the subsequent
> > input operations simply read up to the same ',' character,
> > returning no characters to your program. It will have,
> > effectively, entered an endless loop situation. You either need
> > to take express action by taking an extra step to remove the
> > extraneous character via:
> >
> > in.get (name, NAME_LEN + 1, ','); in.get();
> >
> > or using an alternate means of input, one that *does* extract
> > the delimiter character:
> >
> > in.getline (name, NAME_LEN + 1, ',');
> >
> [snip]
>
> thanks everyone for your help, especially Anthony..
>

No worries - glad to be of help :)

>
> I had a lot of trouble when I added a second or more lines
> to the csv file. It wouldn't move past the first line, or if it
> did the values got out of order as I wrote them to a struct.
>
> I had a file of:
>
> char, long, float, float, float, long
> char, long, float, float, float, long
>
> This must have something to do with there not being a
> ',' at the end of the line.
>

The modified code was meant only to show you how to handle the 'troublesome'
delimiter character, and not provide a complete solution to reading a CSV
stream. As the other respondant to your query - Sumit Rajan - mentioned:

    "You will also need to take care of newlines in your file."

It sounds as if you did not do this [among, possibly other things], hence
the problems you experienced. It is recommended that aside from the
tutorial, you also access suitable C++ Standard Library documentation - a
Google search should help here.

>
> I tried "in.ignore(10000, '/n');", and in.get(); to force a move
> to the next line but that didn't work.
>

A typographical error, here:

    in.ignore(10000, '/n');

instead of:

    in.ignore(10000, '\n');

was your undoing ! See explanation below.

>
> I solved the problem by using a getline without
> deliminators to grab a full line of text and then using strtok
> to break the line up about the ','.
>
> while (! in.eof() )
> {
> in.getline (buffer,100);
> pbuffer = strtok (buffer,",");
>
> while (pbuffer != NULL)
> {
> strcpy (data[n].code,pbuffer);
>
> pbuffer = strtok (NULL, ",");
> data[n].date = atol (pbuffer);
>
>

Certainly a valid overall approach - reading in a 'line' [a character
sequence delimited by a newline] into memory then tokenising it [though, I
agree, there are more elegant approaches available]. However, don't give up
on streams just yet - you simply have to learn how to use them :) !

>
> I don't think this is an elegant solution, but it works. I
> don't fully understand why or how it works (damn internet
> tutorials and examples making me confused) but I've only been
> at this 4 days).
>
> I also don't understand why :
>
> in.getline(name, NAME_LEN + 1, ',');
>
> while (in)
> {
> cout << name << endl;
> consolePause();
> in.getline(name, NAME_LEN + 1, ',');
> }
>
> wouldn't move to the next line. Hopefully these things will
> start to unfold as I learn more c++
>

It isn't a mystery - see below. Take, for example, the following data [the
'|' is not data but is used here to marks the start of a 'line']:

   |xxxx,yyyy,zzzz<newline>
   |xxxx,yyyy,zzzz<newline>
   |xxxx,yyyy,zzzz<newline>

An initial 'in.getline' call using a ',' as delimiter will extract 'xxxx',
and the ',' delimiter. The next such call extracts 'yyyy', and ',', and the
one after that extracts 'zzzz', and the newline. Since the newline is not
the specified delimiter, an attempt is made to read up until NAME_LEN
characters in the hope that the ',' delimiter is found.

It is, with the consequence that:

   |zzzz<newline>xxxx

is read in as a single character sequence ! The pattern repeats until the
final character sequence:

   |zzzz<newline>

is read in [the very next read should force an end-of-file condition and
terminate the loop].

However, if the data were instead:

   |xxxx,yyyy,zzzzzzzzzzzzzzzzzzzzzzzzzzzz<newline>
   |xxxx,yyyy,zzzzzzzzzzzzzzzzzzzzzzzzzzzz<newline>
   |xxxx,yyyy,zzzzzzzzzzzzzzzzzzzzzzzzzzzz<newline>

where the 'zz...' character sequence is 28 characters long, exceeding
NAME_LEN by 8 characters, then only the 'xxxx', and 'yyyy' character
sequences would be successfully read in - the lack of room to accomodate all
the 'zz...' characters would force the stream into an error state [i.e.
'while (in)' would then evaluate to false, and the loop terminate].

In order to correctly handle such data you would need to ensure:

* The input buffer is large enough to accomodate the
    largest anticipated delimited character sequence

* The newline is correctly extracted and handled

To implement the first task you could use an over-large 'char' array for
reading a delimited character sequence in, testing its length, and if it
passes, copy it to the relevant storage location. Of course, you can also
use correct-size storage areas and simply let the failure of a read indicate
the presence of a data problem. Use of the 'std::string' type is, yet,
another option.

To correctly extract the newline use either a 'in.getline' or an 'in.ignore'
in each case specifying the newline as delimiter. So, you might try using an
'in.getline' for each delimited character sequence followed by one for
newline.

I'll leave the explanation at that ! However, be aware that there is more to
streams, including other data extraction techniques [i.e. stream operators,
using stringstreams, etc], and issues such as stream state checking /
resetting, things which will certainly become clear with time. The keys to
success in this endeavour include:

* Be a fearless experimenter
* Read the relevant doco

I hope this helps.

Anthony Borla



Relevant Pages

  • Re: "Read stuff from a file and chop it up to do stuff" code advice wanted.
    ... ;; This function returns TRUE if any character ... (if (char< char #\!) ... a stream and an array to hold characters in temp memory. ... ;; resulting string. ...
    (comp.lang.lisp)
  • Re: How to make Word STOP rewriting your copy
    ... The "character stream" that I was referring to is the GUI stream and not a physical storage stream (although I still believe the paragraph marker originally had it's roots there). ... the "power" of a tool can only be measured by how much PROODUCTIVE work can be done AT A GIVEN COST. ...
    (microsoft.public.mac.office.word)
  • Re: Read only last line-
    ... stream with a value that does not correspond to a position in the file ... example is putcon Windows, where one character ... That's the mapping you have to support. ... and fgetpos/fsetpos. ...
    (comp.lang.c)
  • Re: How do I stop a Winsock from buffering characters?
    ... it's applied at the OS level to the socket. ... Stream s = client.GetStream; ... from the client code and have the server see it right away. ... first character of the client send. ...
    (microsoft.public.windowsce.embedded)
  • Re: OpenSSH: spaces in user name
    ... Unruh> quotes and double quotes into my name. ... such a delimiter plays a very particular ... In your case you want it to either be a character in a name, ... humans but even more confusing to the computer. ...
    (comp.security.ssh)