Re: Programming in standard c



Yes, you'd have to pass a parameter specifying which mode to use,
or open the file and let the system use the same mode as what you
opened it with. Or have two different functions, like filetextsize()
and filebinarysize().

Which means the OS, when writing a block of data, no longer has to merely
write it, but parse it - look for any embedded characters which would be
translated into greater or lesser sequences, and record that value as
well. I suspect this is going to have an impact on performance - assuming
you can get 'em to do it at all.

I did not say the OS has to have either or both of the sizes
precalculated. If the result impacts performance more than reading
the whole file and counting bytes, taking into account things like
how often file sizes of either type are needed and how often writes
are done, then someone made a poor decision of pessimization.

I consider having the text file size used for reading the file into
memory to be used insufficiently often to make it worth caching it.
Your opinion may differ.

POSIX happens to keep *both* precalculated, since there's no
difference between binary and text mode. Windows keeps the binary-mode
size precalculated. Thus, performance for getting the text-mode
size may suck significantly more than getting the binary-mode size
on Windows.

Alternatively, the function itself could do the job, by opening the file
and reading the file, in the appropriate mode, beginning to end. Can you
say performance hit?

If you don't need the correct answer, you can do it in zero bytes
and zero time. But if the performance hit is so bad, maybe you
shouldn't use a method that needs a precalculated file size. That
approach of reading the file in chunks (this is to read it into
memory, NOT precalculate the size) and realloc()ing when needed
(say, doubling each time, with fallback if you run out of memory)
is starting to look more and more efficient all the time, even with
the copying (if any).

While we're at it, how about revisiting the strategy of reading the
entire file into memory? Is it really a good idea? If the file is
large, you may force parts of this program or other programs to page
out. Slow. Now, depending on what you are doing with the file, reading
it in chunks might be worse. Or better. If you're just dumping the
file in hex, reading chunks at a time lets your program run in much
less memory, and makes it work on files MUCH larger than what you can
fit in memory.

If you are intending to read the file into memory, which mode to you
intend to use when reading it? That is the correct mode to use for
computing the size.

This assumes I will only ever read the file in one mode, or determine size
by reading the file, btyewise, at time of determining the size. The
former isn't reliable, the latter is hellishly inefficient.

Each time you read the file into memory, you read it in *one* mode,
I hope (no switching in the middle of the file). When you want the
file size for that buffer, you read it in that one mode. How you
read it last time or will read it next time is irrelevant.

You made a bad decision, performance-wise, to use a precalculated
file length, especially in text mode if the OS doesn't keep the
value handy and text mode != binary mode. Stick with that decision,
and performance is going to suck.

If you *must* have a precalculated value, have the OS save the one
involving the same mode that the file was written in (and which
kind it is). My guess is that this will cover at least 80% of the
times that file size is needed for the purpose of reading the file
into memory.

will our file size function return? Should it have a parameter which
lets you specify? Is the size you determine _now_ the size of the file
at the time you read it? Several examples have been given where this
won't be the case.

The function returns a value as of a particular time.

Yes, but again - which value?

It returns the one associated with the mode you intend to use to read
the file into memory. You have to make up your mind which mode to use
before you start reading. Use the same decision when you determine
the file size.

There are other uses of the file size, such as comparing the output size
with the expected output size in a regression test.

Sure. Now, again, *which* file size? Determined *how*?

The size associated with the mode the file was written in, if your
application knows what that is (and no, I don't expect the OS to
keep track of it). It's up to your application to know what mode
to open its own files in. Either it knows from what prompt was
answered (e.g. text editors always do text files; graphics editors
always do binary files), or the file extension, or it asks the user,
or it just handles generic files and can do everything in binary
mode.

(This assumes that the reference "correct" output was generated on
THIS system or was converted to the local file format. If it wasn't,
well, size comparisons may be totally worthless). Since here,
you're using file size as a shortcut for comparing the files for
equality to quickly find a mismatch, you can dispense with the step
entirely and proceed to reading the files byte-by-byte and comparing
them if finding the size is a performance bottleneck.

.



Relevant Pages

  • Re: Noise Level of the PowerMac G5
    ... Or is it just a little bit more memory slots, ... If you are spending most of your time reading, thinking and editing ... I may have 10 to 15 apps open at once and switching between them a lot, ...
    (comp.sys.mac.misc)
  • Re: Large text file - in memory ( > 60mb)
    ... The file is over 64mb in size, reading it line by line to do a search ... while running the app, it would mean reading/searching the>64mb file many ... Then I have to show this record found (wich ... maybe creating a datatable to ease the search but I'm pretty sure memory ...
    (microsoft.public.dotnet.framework)
  • Re: Off Topic: Memory
    ... Given that my memory is something of a concern to me, ... I've only just started reading it but it addresses the causes ... - the rather haphazard way our brains are built, sort of like kludged ... P.S. haven't tried the test yet - keep forgetting ...
    (rec.music.makers.guitar.acoustic)
  • Re: Programming in standard c
    ... may grow larger between the query and the reading. ... That wider programming environment can provide guarantees ... and learned my craft in the days when memory was ...
    (comp.lang.c)