# Re: Numerical accuracy of C++ and Fortran programs on 32 bit machines

Thomas Smid wrote:
> Duane Bozarth wrote:
>
> > The logic in the Standard is that in the general case, there is no
> > information regarding what those decimal digits <should> be, so the
> > low-order binary digits are set to zero which results in the case as
> > seen. It's only in the special cases such as you have here where
> > heuristically you can see the value should be zero-extended or a
> > continuing fraction, et., that it makes actual sense to do
> > so...otherwise, there is no information for the compiler to rely on that
> > is "more" correct than the zero lower significant binary digits.
> >
> > The lesson is, if in Fortran one wants full double precision, specify it
> > initially and initialize constants appropriately.
>
> I found now the following link dealing with the problems of Fortran-
> floating point operations : http://www.lahey.com/float.htm .
>
> This also mentions an example practically identical to mine:
> __________________
>
> The following program prints "1.66661000251770" when compiled with
> Lahey's LF90:
> DOUBLE PRECISION D
> REAL X
> X = 1.66661 ! Assign to single precision
> D = X ! Convert to double precision
> PRINT *, D
> END
> You ask, "Why do you extend the single-precision number with the
> seemingly random '000251770'?" Well, the number isn't extended with
> random values; the computer's floating-point does the conversion by
> padding with zeros in the binary representation. So D is exactly equal
> to X, but when it is printed out to 15 decimal digits, the inexactness
> shows up. This is also another example of insignificant digits.
> Remember that assigning a single-precision number to a double-precision
> number doesn't increase the number of significant digits.
> _______________________
>
> Even though this may be so, the point is that the apparent accuracy of
> operations is inconsistent: as mentioned before, for x=1.2 and y=0.8
> you get after conversion to double precision x=1.2000000476837158E+00
> y=8.0000001192092896E-01 and x+y= 2.0000000596046448E+00. However, for
> x=1.1 and y=0.9 you get x=1.1000000238418579E+00 ,
> y=8.9999997615814209E-01 and x+y=2.0000000000000000E+00. So in the
> latter case the result of the addition suggests that it is exact to
> double precision, i.e. had a made my test originally with the latter
> values, I might have never noticed that in general the result is only
> single precision here. I very much feel that this situation is
> unacceptable.
>
> Of course, as you said, you can avoid all this trouble by using the
> double precision constants 1.2D0 etc. (and at the moment this seems to
> be the only way to deal with this issue), but I wonder then what the
> purpose of having different data types (and conversions between them)
> is in the first place. In the present situation, it would be consequent
> if any mixed expressions are declared a syntax error, or at least the
> compiler should issue a warning, but in my opinion it should not be so
> difficult to make the compiler realize that 1.2 etc. are fixed point
> constants that should translate into 1.200000000000000E+00 etc. in
> double precision.

To elaborate on Duane's answer:
Fortran is perculiar in defaulting constants to the precision of float.
But this isn't your core problem.

You will get FP behaviour that is inconsistent with respect to decimal
arithmetic on almost every type of computer currently in use, since
they use binary arithmetic. Once the data has become binary the
information to translate it back to decimal is lost.

The compiler is not made to handle situations like this:
X = 1.66661 ! Assign to single precision
D = X

It is made to handle the situation where X is the result of other
calculations or comes from a file. In this case the compiled code
couldn't be expected to return the correct result since the information
to construct it is lost.

If you need fixed point arithmetic to work as expected for your
application use a fixed-point library not FP.

.

## Relevant Pages

• Re: Trig Functions In Basic
... Don't want to get into a phil disc on how to build a compiler, ... You say you can't afford to lose that precision. ... If your project needs 14 significant digits and the ... platforms from this page (it's Japanese, but the download links at the ...
(comp.lang.basic.misc)
• Re: Types and Precision
... happily take you as far as 33 digits. ... Fortran allows for whatever the compiler writers decide to support. ... the result only needs to be single precision, ...
(comp.lang.fortran)
• Re: IBM2435I on ROUND(x,-3)
... You don't say what compiler and platform, but possible it's due to the conversion of the character constant. ... See the specific target types of coded arithmetic data using the attributes of the constant as the source. ... If an intermediate result is necessary, as in evaluation of an operational expression, the attributes of the intermediate result are the same as if a decimal fixed-point value of precision had appeared in place of the string. ...
(comp.lang.pl1)
• Re: IBM2435I on ROUND(x,-3)
... At run time the string itself may contain the character representation of any valid coded ... The key point is that at compile time, all that the assumption of FIXED DECIMALis used for is to determine the base, scale, mode, and precision that the value of the string will be converted to at run time. ... For each operation in an arithmetic expression, whether it be a prefix operation, an infix operation, or a builtin or user defined function, the compiler needs to know the base, scale, mode, and precision of each operand. ... Only in the case where a FIXED DECIMALvalue would have required no conversion is the string actually converted to those attributes. ...
(comp.lang.pl1)
• Re: increasing width
... the compiler will do an automatic conversion of either ... The problem is that zero is correctly ... represented in any precision on any existing machine. ... tell you that the standard allows a compiler ...
(comp.lang.fortran)