Re: float limits
From: Jack Klein (jackklein_at_spamcop.net)
Date: 08/26/04
- Next message: name: "Re: Wrap rev 2."
- Previous message: Gordon Burditt: "Re: float limits"
- In reply to: ziller: "float limits"
- Next in thread: Joe Wright: "Re: float limits"
- Reply: Joe Wright: "Re: float limits"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 25 Aug 2004 21:51:15 -0500
On 25 Aug 2004 17:06:31 -0700, ziller@gmail.com (ziller) wrote in
comp.lang.c:
> Why is it that FLT_DIG (from <float.h>) is 6 while DBL_DIB is 15?
Because that is what the implementation documents that it provides, as
required by the C standard. FLT_DIG and DBL_DIG are required to be at
least 6 and 10 respectively.
> Doing the math, the mantissa for floats is 24 bits = 2^24-1 max value
> = 16,777,215.0f. Anything 8-digit odd # greater than that will be
> rounded off.
> For doubles, the mantissa is 53 bits = 2^53-1 max value =
> 9,007,199,254,740,991.0l (that's an L). So 16 digit odd numbers
> greater than that will be rounded off. To get the actual precision we
> take log(base 10) of those numbers and get 7.22 and 15.95
> respectively.
>
> ...floats have greater than 7 digits precision and doubles only
> greater than 15 digits. So how does MS guarantee no rounding errors
> for 15 digit doubles yet 6 digit floats (if I understand correctly,
> the last digit of precision must be used to round off the number...the
> numbers are not just truncated at 7 & 15 digits...)
>
> Anything I'm missing for the doubles case? It looks like they should
> be guaranteeing 14 digits.
What you are missing is that the C standard imposes no requirements
for "no rounding errors". In fact rounding errors are guaranteed in
almost all floating point operations.
The definition of those terms is spelled out clearly in C standard,
and it says nothing at all about rounding errors. Basically, these
values represent the largest number of decimal digits that can be
fully represented in the floating point type.
If FLT_DIG is 6, that means that any integral value in the range of
-999,999 to +999,999 can be placed into a float and then into a large
enough integer type and result will be exactly the same as the
original number.
If DBL_DIGIT is 15, that means any integral value in the range
-999,999,999,999,999 to 999,999,999,999,999 can be placed into a
double and then into a large enough integer type (if one exists) and
the result will be exactly the same as the original value.
Nowhere is there any mention of rounding at all.
If I assume that you mean Microsoft's 32-bit x86 implementations, you
have some errors in your calculations. Not the calculations
themselves, but your assumptions about the number of mantissa bits in
the Intel FPU single and double precision types, which are 23 and 52
respectively, not 24 and 53.
Which results in ranges of 8,388,609 and 4,503,599,627,370,496
respectively. There are 7 decimal digit numbers outside the range of
magnitude for the former, and 16 digit numbers for the latter.
<off-topic>
If you want to understand the actual format of Intel floating point
representations, you can download the documentation for free from
http://developer.intel.com. If you do, don't bother looking at the 80
bit extended precision format. Microsoft has decided that you aren't
qualified to use that format at the expense of "compatibility" among
Windows versions on various processors.
Here's a quote from Microsoft:
With the 16-bit Microsoft C/C++ compilers, long doubles are stored as
80- bit (10-byte) data types. Under Windows NT, in order to be
compatible with other non-Intel floating point implementations, the
80-bit long double format is aliased to the 64-bit (8-byte) double
format.
The complete web page may be found at:
http://support.microsoft.com/default.aspx?scid=kb;en-us;129209
</off-topic>
-- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
- Next message: name: "Re: Wrap rev 2."
- Previous message: Gordon Burditt: "Re: float limits"
- In reply to: ziller: "float limits"
- Next in thread: Joe Wright: "Re: float limits"
- Reply: Joe Wright: "Re: float limits"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|