Re: On writing negative zero - with or without sign



James Giles writes:

So the real problem is that people tend to think that programs
can represent, manipulate, or produce metaphysically perfect
"exact" zeros in the first place. Such things are fabulous
monsters.

I sure hope that a program can represent an exact zero. [...]

It certainly can represent zero: as one of many possible values
of an approximation that includes zero. If that's your definition
of representing an "exact" zero...

My definition of an "exact" zero is whatever internal representation
is used when two identical internal representations are differenced.

The point is that zero isn't the
*only* value associated with that approximation. Floating point
programs represent, manipulate, and produce approximations.
Zero is no exception to that rule.. If you choose to believe that
such approximations *are* zero and *only* zero, you'll likely
have some problems. As soon as your zero is in the hands of
the float implementation it becomes an approximation.

A subset of floating point numbers can have an exact representation.
Zero should be a member of that subset.

The danger arises when you start manipulating a number. For
example,

REAL X,Y
LOGICAL L

X = 1.0
Y = SQRT(X*X)
L = X == Y

In this case, I no longer expect L to have the value .TRUE.,
even though mathematically it should be the case. It probably
will be .TRUE. for certain selected values of X that can be
exactly represented, but for other values, expect .FALSE..

In the example given here, I would expect L to be .TRUE. This
is because SQRT is part of the IEEE standard and has required
bounds on it's accuracy. So is conversion from decimal. So is
comparison. With those three defined properties of the number
system I would expect L to be .TRUE. for a fairly large set of
literals you might substitute for 1.0 above (including 1.0 itself),
regardless of rounding mode. In fact, even most non-IEEE
implementations can guarantee such behavior.

What about the reverse case:

Y = (SQRT(X))**2

I would expect this because the mathematically exact calculation
(which is not what floating point does) is well within the required
tolerances of the approximations of the values and operations of
floating point. The properties of the approximations are ironically
*exactly* defined as are the properties of the operations on those
approximations. Except for order of evaluation (which is often
left to the language implementers discretion) floating point math
is deterministic and well defined.

But subject to approximation.

Now a different (very similar) example with some other literal
than the set I identified above would not only result in L being
..FALSE., but the very rules of IEEE rounding would *require*
that L be false for those values (what set of literals that is depends
on what rounding mode you select).

Now all this ignores the Fortran standard itself which anomalously
(compared to most languages) allows the expression as written to
be evaluated by any "mathematically equivalent" expression instead.
In this case, it's perfectly reasonable for Fortran to notice that
SQRT(X*X) is the same as ABS(X). (Whether it's allowed to
disregard the possibility of intermediate overflow is something
I've seen disputed. I won't argue it here.) So, you might get L
defined to .TRUE. for nearly all values of the literal in the
example.

In which case I didn't choose the best example for the point I was
trying to make. My point is that after manipulation, one should not
expect a number to have the same exact internal representation, even
if algebraically the number should be unchanged. There may be
special cases where the compiler may determine that no manipulation
is necessary, thus the internal representation should stay the same.
For purposes of my example, let's ignore those cases. But we're
drifting away from the main point, which is about my concept of an
"exact" zero, as I defined above.

All this is part of the reason that many (including me) recommend
that new computing students should recieve at least a semester
(and maybe more) of instruction on floating point. Most people
get almost no such instruction. Indeed, among the ironic properties
of floating point is that although it can't do exact real arithmetic, it's
carefully designed rules can often allow you to do exact discrete
arithmetic. I've often needed integers with more than 32 bits of
range and didn't want to pay the speed penalty of a multiple precision
integer package. IEEE double carries 53 bit significands. Careful
use of double allows correct integer arithmetic provided your
intermediate values don't exceed 2^53 in magnitude. This is often
all I need.

Actually, I'm dealing with an interesting floating point case right
now, but have been debating whether to bring it up for discussion in
this newsgroup.

.