Re: canonical conversion of float/double to strings
- From: Sigfried <sig.fried@xxxxxxxxxxx>
- Date: Wed, 04 Feb 2009 14:58:22 +0100
Thomas Pornin a écrit :
java.lang.Double.toString(double x) converts a double value to a
decimal string which, when converted back with Double.parseDouble(),
must yield the same value (except for NaN, where a different NaN may
be obtained).
Additionally, the specification of that conversion is constrained so
that it is "canonical". That's what java.util.Formatter states, at
least: "For a canonical representation of the value, use
Float.toString(float) or Double.toString(double) as appropriate."
The specification mostly states that the decimal representation
should use the smallest number of digits beyond the dot, as long
as at least one digit remains, and parseDouble() would find back
the correct value.
This specification has a few trickeries; for instance, the decimal
exponent is chosen rather early in the conversion, which may imply
a longer encoding with a sequence of trailing nines. This is
discussed in this bug report:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4511638
I am NOT talking about that specific bug/feature here.
I have found that some values seems not to follow this specification
(neither the exact specification in the Javadoc comment, nor the amended
specification which is considered as better in the comments of the bug
above).
For instance, consider 0x1.d2adc04837eddp60; this is converted to
the decimal string:
2.10173408806586701E18
but this shorter string would also be valid:
2.101734088065867E18
and Double.parseDouble() converts this string back to the exact
original double.
I am using Sun's JDK 1.6.0_10 on a PC running Linux in amd64 mode:
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)
When I select random double values (i.e. uniform selection of sequences
of 64 bits, which I convert to double values with
Double.longBitsToDouble()), I find that about 0.29% of these values
match the conditions above (a shorter and equally reparseable decimal
representation exists, with the same exponent). These problematic values
are not uniformly distributed: they all seem to be between 1E16 and 1E19
(and similarly between -1E16 and -1E19).
Is there something I have not understood properly ? Is the decimal
representation computed by Double.toString() as canonical as Formatter
pretends ? Is there a bug in the implementation ? Is this already known,
discussed and documented somewhere ?
(As a side note, for a truly canonical representation, the specification
should also tell which decimal representation should be used when
several of identical length match; for some values, there is not even
a single nearest value.)
Who cares about trailing zeros ? You have found a bug in the javadocs where "canonical" is wrongly used ? OK fine, but are purchasing random "bugs" like that in the docs, or does it come from a use case ?
.
- References:
- canonical conversion of float/double to strings
- From: Thomas Pornin
- canonical conversion of float/double to strings
- Prev by Date: Re: Reading null terminated strings in Java
- Next by Date: Re: Reading null terminated strings in Java
- Previous by thread: Re: canonical conversion of float/double to strings
- Next by thread: Re: Hibernate "fixtures" or database population?
- Index(es):
Relevant Pages
|