Re: parse two field file



Richard wrote:
Which way would you guys recommened to best parse a multiline file which contains
two fields seperated by a tab. In this case its the
linux/proc/filesystems file a sample of which I have included below:

nodev usbfs
ext3
nodev fuse
vfat
ntfs
nodev binfmt_misc
udf
iso9660

The first field can be "empty" and concist of only a single tab
character. The seperator is a tab.

Is sscanf best suited to this? Or use strtok/strtok_r?

strtok(..., "\t") will give the same result for "\tfoo"
and "\t\tfoo\t" and "foo". If you *know* that the input has
two tab-separated fields and that only the first (never the
second) can be empty, you can get this to work: If strtok()
finds two fields they are #1 and #2, but if it finds only
one it is #2 with #1 empty.

However, it makes me queasy to put that much faith in an
input source I don't control programmatically. Who knows?
Maybe in six months somebody will extend the format, adding
an optional third field. If that happened, then the field-
counting approach would misinterpret "\tfoo\tbar" as if it
were "foo\tbar". It would be better to adopt a method that
would complain about "\tfoo\tbar" than to be fooled by it.

fgets() plus sscanf() is a possibility, but it's a bit
tricky to use: The obvious "%s\t%s" will not do what you
want. (The first "%s" will skip any leading white space,
leaving you in the same hole as the strtok() approach, and
the "\t" will match any amount of any kind of white space,
tabs or other.) Something like "%[^\t]%*1[\t]%s" would do
a little better, but still wouldn't be fully satisfactory:
It would match the prefix of "foo\tbar baz goozle frobnitz"
without any warning of the trailing junk. You could use
"%[^\t]%*1[\t]%s%n" and then check that sscanf() had in fact
consumed the entire string ...

... but wouldn't it be simpler just to pick the line
apart for yourself? Read it in with fgets(), use strchr()
to find the first tab (syntax error if there isn't one), and
the first (possibly empty) field is everything from the start
to just before the tab. Then start just after the tab and use
strchr() again to find the terminating '\n'; the second field
is everything from just after the tab to just before the '\n'
(syntax error if its length is zero). You can use strcspn()
to check that the second field contains no white space and
squawk if it does (somebody added a third field you don't
understand).

The field I am really interested in is the second one : any hints & tips
appreciated as to do this in the most efficient manor.

The "most efficient manor" is the house of Usher. Resist
this unnecessary impulse for efficiency, lest your program meet
the same fate as did that storied manse.

(In other words: How long is this file, anyhow? How many
times will you scan its contents? If you sped up the scanning
by a factor of four hundred twenty gazillion, how much faster
would the program as a whole run? If you give your SUV a coat
of wax, will you improve its fuel economy by making it slipperier
or harm it by adding weight?)

--
Eric Sosman
esosman@xxxxxxxxxxxxxxxxxxx
.



Relevant Pages

  • Re: Sankey Retaining RIng
    ... Make sure the spear is seated and twisted with the tab under the lip. ... O.K. So I am reading all of this, and thinking about the Pony Keg I have ... Turn over and empty. ...
    (rec.crafts.brewing)
  • Re: cant create new mailbox
    ... If the Email Address tab is empty, you've got a problem with your Recipient ... For Exchange news, links and tips, check: ... > I had tried run the "rebuild" command from Recepient update service. ...
    (microsoft.public.exchange.admin)
  • Re: Which SKU has Web Services wizards- DO YOU HAVE PRO?
    ... "Thomas Miller" wrote in message ... I would hope the tab would not be available if the tab was empty. ... BSS Enterprise Accounting FrameWork ...
    (borland.public.delphi.non-technical)
  • Permission entry without anything
    ... on several objects in AD (such as user or computer account) there is a weird thing on the Advanced permissions list in the object properties. ... The ACE is empty. ... the column Permission is empty. ... When I go to the properties of such an entry, the focus is on Properties tab, but still both Properties and Object tabs are empty. ...
    (microsoft.public.windows.server.active_directory)
  • Re: MICROSOFT WORD PROBLEM
    ... Word 2000 doesn't have this box under ...View tab. ... "Show white space between pages in Print Layout view" box | OK. ... Word MVP web site http://word.mvps.org ...
    (microsoft.public.word.docmanagement)