Re: Unwanted rounding



In article <5PmdnXasgJ7mo-7YnZ2dnUVZ_r6dnZ2d@xxxxxxxxxxx> Joe Wright <joewwright@xxxxxxxxxxx> writes:
Gordon Burditt wrote:
[ snip ]
Given our ubiquitous 64-bit IEEE double (53 mantissa bits)
36.099 as double has no precision beyond
3.6098999999999997e+01
That printf("%.60f", 36.099) can give you something like
36.098999999999996646238287212327122688293000000000000000000000
might tease you to believe you have precision to 40+ digits. You don't.

Since the value given above doesn't end in 5 followed by trailing
zeroes, and it's not an exact integer, your example won't happen
unless printf() is introducing unwanted rounding.

Where is (at what position) printf introducing this rounding?

A floating point number is (by definition) a number of the form
m * base^exp
where m and exp are integer (the possibility that m is a fraction
can be ignored because it can be made integer by suitable change
of the exponent) and base is the base of the representation,
which is 2 in IEEE. So a floating point number is in essence a
rational number. If the base contains only prime factors 2 and/or
5, the denumerator of that number is a divisor of a power of 10,
and so the number has an exact representation in finite decimal
notation. So if printf is giving the above representation it is
doing some rounding, because that is not the exact representation
of an IEEE floating point number.

The point of the output is that you have three consecutive floating
point numbers (with no intermediate values in between) so rounding
decimal numbers to put them in floating-point variables is inevitable
and will result in errors.

I'm at a loss here. I have no idea what you mean.

Given some number in decimal notation there is either a single
floating point number that it matches, or there are two floating
point numbers, one of them larger and one of them smaller than
the number given. 36.099 does not have an exact representation,
so there are two numbers, one larger and one smaller.

A floating-point variable contains a number (except when it's NaN
or Inf or some such thing) and it is perfectly possible and reasonable
to print out *EXACTLY* what that value is, to infinite precision,
particularly when investigating problems of unwanted precision loss
or comparing what you got with what you should have gotten if
everything was done in infinite-precision math.

A floating point variable (double, let's say) can hold a value precise
to approximately 17 decimal digits. Nothing infinite about it.

Each floating point number is exactly representable in decimal notation.
Again, nothing infinite in it.

Now, if that number 36.099 represents the weight in kilograms of
something, you are correct that it is highly unlikely to have
anywhere near 17 digits of precision in the result.

Why kilograms? The double has 53 bits and about 17 digits of precision
no matter whether its value is kilos, nanos or light years. Using printf
and friends to show decimal digits beyond 17 or so is misleading.

That may be quite something else. But showing exact representations
to show that 36.099 is not representable as a floating point number is
not misleading at all.
--
*** t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~***/
.