Re: parse two field file



On Sun, 17 Dec 2006 10:37:28 -0500, Eric Sosman
<esosman@xxxxxxxxxxxxxxxxxxx> wrote:

Richard wrote:
Which way would you guys recommened to best parse a multiline file which contains
two fields seperated by a tab. <snip>
strtok(..., "\t") will [lose empty fields]

Right.

fgets() plus sscanf() is a possibility, but it's a bit
tricky to use: The obvious "%s\t%s" will not do what you
want. (The first "%s" will skip any leading white space,
leaving you in the same hole as the strtok() approach, and
the "\t" will match any amount of any kind of white space,
tabs or other.) Something like "%[^\t]%*1[\t]%s" would do
a little better, but still wouldn't be fully satisfactory:

Not enough better. If the first field is empty and thus the first
%[^\t] matches nothing, *scanf stops and doesn't do the %*1[\t]s.

This is effectively the same problem of the people who periodically
try to use {,f}scanf to replace <ILLEGAL> fflush (input) </>.
(Some people, including IIRC Dan Pop, have recommended e.g.
if( scanf ("%*[^\n]%*1[\n]") < 2 ) getchar ();
but I consider that too much uglier than the obvious, though slightly
longer and possibly slightly less efficient
while( (ch = getchar()) != EOF && ch != '\n' ) ;
etc.

Plus unbounded %[...] or %s risks buffer overflow and resulting UB.
You should specify a length at most one less than the buffer size.

It would match the prefix of "foo\tbar baz goozle frobnitz"
without any warning of the trailing junk. You could use
"%[^\t]%*1[\t]%s%n" and then check that sscanf() had in fact
consumed the entire string ...

... but wouldn't it be simpler just to pick the line
apart for yourself? Read it in with fgets(), use strchr()
to find the first tab <snip>

Yes.

The "most efficient manor" is the house of Usher. Resist
this unnecessary impulse for efficiency, lest your program meet
the same fate as did that storied manse.

Yes. Or even the hundred-year shay, IIRC grade school. <G>

- David.Thompson1 at worldnet.att.net
.



Relevant Pages

  • Re: Great SWT Program
    ... text in the user's language of choice is a feature of GUIs and not ... literal tab characters in code. ... characters into spaces" feature on. ...
    (comp.lang.java.programmer)
  • Re: Maybe Safari 4 actually is for Windows
    ... were doing it by clicking a button on the left of the tab area. ... perhaps because direct manipulation feels much more ... for Google search results pages is just that there just isn't anything ...
    (comp.sys.mac.advocacy)
  • Re: Great SWT Program
    ... Vim -- car is fixed, ... and/or press tab again to get a list of choices. ... to type one or more additional characters and press tab again. ...
    (comp.lang.java.programmer)
  • Re: indenting
    ... original programmer's tab size in order to get the code to line up ... As for the key-presses, most editors ... > automate, although the mentioned editor attempts to do that, too: ...
    (comp.programming)
  • Re: MICROSOFT WORD PROBLEM
    ... Word 2000 doesn't have this box under ...View tab. ... "Show white space between pages in Print Layout view" box | OK. ... Word MVP web site http://word.mvps.org ...
    (microsoft.public.word.docmanagement)