Re: Floating point to integer casting



"Morris Keesan" <mkeesan@xxxxxxxxxxxxxxxx> writes:

On Mon, 12 Oct 2009 19:42:49 -0400, bartc <bartc@xxxxxxxxxx> wrote:

chad wrote:
On Oct 12, 10:32 am, Tim Rentsch <t...@xxxxxxxxxxxxxxxxxx> wrote:
Anand Hariharan <mailto.anand.hariha...@xxxxxxxxx> writes:
On Oct 12, 5:36 am, Tim Rentsch <t...@xxxxxxxxxxxxxxxxxx> wrote:
"bartc" <ba...@xxxxxxxxxx> writes:
"Tim Rentsch" <t...@xxxxxxxxxxxxxxxxxx> wrote in message

A minor point -- assignment _always_ performs a conversion,
whether the types of the two sides are the same or different.

So what conversion is performed when assigning an int value to an
int destination of the same width?

http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

6.5.16.1

6.3

Is there any context where this subtlety (viz., "conversion" is
performed even when the types of the operands on either side of = is
the same) is important? If so, is it important to the implementor or
even to the programmer? How/Why?

Yes there is, when floating-point types are involved.
Conversions in such cases are required to discard extra
precision and range (see 6.3.1.5p2). For example, in

double a, b, c;

...

a = b + c;

the plus operation can be computed in greater precision than
(double), but upon being assigned the value must be squeezed
back into a (double) again. For developers, this can matter
when deciding when to simplify expressions. For example:


Okay, I'm going to take the bait here. How can the plus operation be
computed with greater precision than double?

(Example)

Some floating point hardware works internally using 80-bits, when
the precision of double is 64-bits, which can lead to
inconsistencies when intermediate 80-bit results are written to
memory as 64-bits then loaded again, compared with keeping the
intermediate values in the registers.

I was going to say that the expression b + c has type (double), but after
looking in the standard for confirmation of this, I'm confused:

6.3.1.8 Usual arithmetic conversions

"Unless explicitly stated otherwise, the common real type is also
the corresponding real type of the result"
[so the result of b + c would have type double -- MK]

Right.

but I'm confused by paragraph 2 and its footnote, which say

"The values of floating operands and of the results of floating
expressions may be represented in greater precision and range
than that required by the type; the types are not changed thereby. 52)"
and "52) The cast and assignment operators are still required to perform
their specified conversions as described in 6.3.1.4 and 6.3.1.5."

What's meant by this? If "the types are not changed thereby", does this
mean that (b + c) has type double, or not? And if the type is not changed,
what conversion would be necessary to do the assignment to a?

It means, even though the value is represented in greater range and
precision (than (double), for this case), the type is still (double).

The conversion for assignment to 'a' is 'a = (double) (b+c)'.
I know it seems weird that converting an expression to the same
type as the expression can change its value, but that's the rule.


Furthermore, if the result of a floating expression can be "represented
in greater precision and range" than that required, what does this say
about sizeof(b + c)? What can we predict about the value of the expression

sizeof(b + c) == sizeof(double)

in conforming implementations? Can a strictly conforming program rely on
this having the value 1?

The type of (b+c) is still double, even if the result value is
represented with greater range or precision. The sizeof
comparison you wrote is indeed always 1 (assuming b and c are
doubles).


Or is this "greater range and precision" clause merely giving
implementations
permission to represent intermediate results in ways that could give
different results for more complicated floating expressions, e.g.
potentially
giving different results for

((double)(b + c)) - ((double)(e * f))
vs.
(b + c) - (e * f)

where b, c, e, and f are all doubles?

Yes, the point is to give implementation more freedom for
intermediate results, and there is a good chance that these two
expressions will have different values, because casting to
(double) forces any extra range and/or precision of the two
intermediate values (that are operands to '-') to be discarded.
.



Relevant Pages

  • Re: Floating point to integer casting
    ... "The values of floating operands and of the results of floating ... expressions may be represented in greater precision and range ... distinction between expressions and types leaves the above unclear. ...
    (comp.lang.c)
  • Re: Floating point to integer casting
    ... Conversions in such cases are required to discard extra ... precision and range. ... when deciding when to simplify expressions. ... Some floating point hardware works internally using 80-bits, when the precision of double is 64-bits, which can lead to inconsistencies when intermediate 80-bit results are written to memory as 64-bits then loaded again, compared with keeping the intermediate values in the registers. ...
    (comp.lang.c)
  • Re: Strange behaviour when using floats
    ... the performance of a type that is supported by the ALU of the processor. ... Decimal is a floating point type too - just a floating decimal point ... decimal has more precision and less range than ... but both are just sets of numbers with conversions and ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Precision
    ... Whether one does the subsequent calculations in internal ie integer format ... I have always used external format with Precision 4 since my ... floating point calcs which are done in binary have around 3% error albeit at ... positions it is IMPOSIBLE to have a result of 4 decimals with the 2 ...
    (comp.databases.pick)
  • Re: Linear Algebra Challenge
    ... Since I'm using floating point, so I'll never be able to calculate one ... floating point math set to 99 digits. ... As close as I'm willing to wait if I use arbitrary precision. ... This mode is fast; when you select arbitry ...
    (comp.sys.hp48)