Re: Portable record length



On Thu, 10 Aug 2006 07:50:59 -0700, Ron Shepard wrote
(in article
<ron-shepard-2B808B.09505910082006@xxxxxxxxxxxxxxxxxxxxxxxx>):

In the future if you want to write portable files (portable across
languages on the same machine, or across different machines in
general), then you should use formatted data files only.

Since I do this kind of thing a *LOT*, I have to disagree with this
advice as being overly generalized. In recent years, I've probably
spent more time with data file matters like this than with crunching
numbers, so this is a very familiar issue to me. I was the one who
formally proposed stream I/O in f2003 (though I got lots of support). I
also tried to squeeze a storage_size intrinsic in for related reasons,
but I didn't make it for that one (looks like it will likely be in
f2003+x).

The "use formatted" advice is fine for some simple things, but it does
not apply in many cases. For example.

1. You might not have the choice of file format. Someone else specified
the format, possibly long ago, and you are using it. It might even be a
"standard" format. This sounds like the OP's situation.
2. Formatted I/O can be an incredibly bad choice for "large" files. The
exact criteria for "large" varies. Formatted files are, in general,
larger and far slower to process than unformatted ones. Long ago
(multiple decades), I used the philosophy of sticking to formatted for
files transferred between machines. It generated lots of complaints
from users. For example, there were the cases where the formatted file
was too large for the storage medium used for transfer, or even too
large for the temporary storage available on the machine. Even when the
files fit, there were complaints about how long the process took.

3. Formatting inherently involves conversions. It can be tricky to make
sure that you don't loose information in the process. Getting the last
bit right can be done, but it takes non-trivial work. Often that
doesn't matter; sometimes it does.

4. Not to speak of the fact that formatted files don't have perfect
portability without a worry in the world. Ever had to deal with the
issues of cr vs lf vs cr-lf vs record-length-header formats? See
occasional questions here if it hasn't happened to you. ASCII vs EBCDIC
vs other character codes is less often important, but the issue is out
there. Yes, there are usually utilities to deal with these things, but
the point is that the utilities are needed - it doesn't all just work
by magic.

--
Richard Maine | Good judgment comes from
experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain

.