Re: parsing from file

From: Ralmin (news_at_ralminNOSPAM.cc)
Date: 05/17/04


Date: Mon, 17 May 2004 09:37:51 GMT


"Karthik" <removeme_kaykaydreamz@yahoo.com> wrote:
> A thumb rule to deal with files is as follows -
>
> Copy all file contents to memory.
> Close the file
> Process the file contents from data saved in Step 1.

I would only suggest that approach if the algorithm requires moving back and
forth across the whole file's data. Even in that case, for particularly
large files where that approach is not viable, you may be better off using
fseek() or something.

> This would give a big performance boost.

I don't see how it does give a big performance boost. It might make your
program require much more memory than is necessary.

> For eg-
>
> while (!feof(fp) ) {
> fscanf( fp, "%s", buff);
> }

This is a terrible example. Seeing while(!feof(fp)) should flag problems
immediately. A while loop should depend on the success or failure of the
actual file reading function, not the secondary feof test. The problem with
this is that it often causes out-by-one errors in the number of times it
loops.

scanf or fscanf with plain "%s" are just as bad as the gets function. It has
no way to prevent going outside the bounds of the buffer given. You must
always specify a maximum field width with the %s specifier. In addition,
your loop never checks the returned value of fscanf, and it just keeps
overwriting the same buffer with each (whitespace-delimited) string read,
without separating those out into memory properly.

In this case I'd parse one line at a time:

while(fgets(buff, sizeof buff, fp))
{
  /* work on the current line in buff */
}

-- 
Simon.


Relevant Pages

  • Re: Probleme mit FoxPro Anwendung auf Windows 2003 Terminalserver
    ... Specifies the maximum buffer memory size in bytes. ... for nBuffMemSize that is less than 256K bytes, ... Specify 0 for nBuffMemSize to return the buffer ...
    (microsoft.public.de.fox)
  • Re: Fast string operations
    ... > is why people use unsafe code now and then to use pointer arithmetic to ... > loop over arrays without all the unnecessary bounds checking. ... don't return the trimmed string". ... The customer perceives this as a memory leak. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Performance Improvement of complex data structure (hash of hashes of hashes)
    ... Anno Siegel wrote: ... >> Here is the code that I'm using to build up this data structure. ... loop through and increment the ... I know that memory allocation in C is expensive, ...
    (comp.lang.perl.misc)
  • Re: Cost of calling a standard library function
    ... It accesses/reads memory using esi 4 ... > safly move it within the cache, without having to go via ebx. ... try it the same thing on a different earlier CPU, ... should check it out...for "tight inner loop" stuff, ...
    (alt.lang.asm)
  • Re: Restoring a NorthStar Horizon, problems with SRAM board
    ... A-D FDC is a memory mapped IO device so I would expect some behavior ... loop read/write functions - which puts the system into a two instruction ... SRAM on the Vector Graphics Flashwriter video card located at $F000 ...
    (comp.sys.northstar)