Re: BigNum -- Floating Point

From: Programmer Dude (Chris_at_Sonnack.com)
Date: 12/19/03


Date: Fri, 19 Dec 2003 11:25:46 -0600


"Arthur J. O'Dwyer" wrote:

> You're still missing *something*. Below, you say you are using
> a "decimal point"; i.e., base 10^k, whereas here you say you're
> using base 2^32. You can't have it both ways.

But I want it both ways! (-:

But all seriousness aside.... maybe if I just explain exactly what
I have in mind and give an example it will help. I get the feeling
I'm doing this in a non-standard way, which is what is confusing
people a little. Either I've glommed onto Something New, or I've
discovered Something Lame. (Hence my starting this thread! :-)

I'll use the real numbers, 42.5 and 3.14159. Internal storage
format is binary. The input conversion process *records* the
position of the DP and then, essentially, discards it. I'll
use the notation {dp=N} to indicate the stored location of the
DP. The 'N' is the number of decimal digits. Conversion, then,
looks like this:

    42.5 ==> 425 {dp=1} ==> 110101001 {dp=1}
     3.14159 ==> 314159 {dp=5} ==> 1001100101100101111 {dp=5}

Output conversion just reverses this.

Basically, all numbers are considered though they were integers.
The internal representation is really just a string of bits.
Those bits are packaged in 32-bit quantities in unsigned ints.
ALL MATH IS DONE IN BINARY ON A STRING OF BINARY BITS. Obviously,
the math is done in chunks. The first chunkage occurs in handling
the 32-bit quantities, but further chunkage occurs during some
operations. For example, addition breaks the 32-bits into 16-bit
quantities so I can detect overflow. (Yes, I realize I could use
a 64-bit long long or whatever, but I do not *yet* consider that
suffiently portable).

Multiplication and division are simple enough. Obviously the
string of digits you'd get by multiplying 42.5 x 3.14159 is
the same string you'd get from 425 x 314159. The only difference
is where the "DP" goes. But it's trivial (in mult and div) to
determine that:

in decimal (virtual):
    425 {dp=1} x 314159 {dp=5} ==> 133517575 {dp=6}

in binary (internal--really what's happening):
    110101001 {dp=1} x 1001100101100101111 {dp=5}
                           ==> 111111101010101000100000111 {dp=6}

All this works rather simply and nicely (elegantly!)

The problem comes from operations, like add, subtract, compare and
the logical ops. For these, the "dp" (a better term might be the
"digit positions") needs to be aligned. This does NOT work:

     425 {dp=1} + 314159 {dp=5} ==> 314584 {dp=??}

To make it work, one must adjust one of the numbers. To maintain
precision, one must *increase* the lesser. The increase needed
(because the *binary* encoding represents a *decimal* string of
digits) is a "x10" for each digit we must add. The difference in
the DP in the example is 4, thus 4 "x10"s are required (alternately
a single "x10000" operation--see below for some discussion on this).

What we need is:

     4250000 {dp=5} + 314159 {dp=5} ==> 4564159 {dp=5}

Regarding the x10-er...

I'm sure you understand that a "x10" operation of a binary number
can be done with [left-shift(x,1) + left-shift(x,3)]. Thus, if
one can come up with a blindingly fast arbitrary length bit shift
and an equally fast binary arbitrary length bit adder, this might
not be a *horribly* expensive operation.

To be determined is whether multiple x10 operations benefit from
increasing the multiplier (effectively adding more left shifts)
or if there's no real penalty in looping over a fast x10-er.

For reference, the number of shifts for various multiples of ten:
x10 xN
 2 2 10 = <<1 + <<3
 4 3 100 = <<2 + <<5 + <<6
 6 6 1000 = <<3 + <<5 + <<6 + <<7 + <<8 + <<9
 8 5 10000 = <<4 + <<8 + <<9 + <<10 + <<13
10 6 100000 = <<5 + <<7 + <<9 + <<10 + <<15 + <<16
12 7 1000000 = <<6 + <<9 + <<14 + <<16 + <<17 + <<18 + <<19
14 7 10000000 = <<7 + <<9 + <<10 + <<12 + <<19 + <<20 + <<23
16 12 100000000 = <<8 + <<13...<<16 + <<18 + <<20...<<24 + <<26
18 13 1000000000 = <<9 + <<11 + <<14 + <<15 + <<17 + <<19 + <<20
                    + <<23 + <<24 + <<25 + <<27 + <<28 + <<29

So it *seems* it's probably faster to implement multiple shifts
depending on the required magnitude increase. Until I design the
algorithms, I'm not sure if other effects will have an impact.

(FWIW, the shifting is done in 32-bit chunks. The adding is also
done in 32-bit chunks, but those are divided into two 16-bit chunks
to guarentee carry detection)

Does any of this help explain what I'm considering here?

> Are you really using *all* of the capacity of your unsigned-int
> digits, or are you taking each digit modulo (e.g.) 1,000,000,000
> during arithmetic operations? This is VERY IMPORTANT, and you're
> just going to confuse people if you don't straighten out the
> terminology!

*ASSUMING* the unsigned int is 32-bits, I'm using every bit of it.
If they are longer, I'm not using the extra.

The ONLY reason "decimal" enters into this is that "alignment" of
two numbers of different "DP" requires multiplication by ten.

>> (I use 'bigint' and 'bignum' for integers and reals.)
>
> Whatever. I guess 'bignum' is bad terminology, since "num" is
> ambiguous. I propose to split the difference and use "bigint,"
> "bigfrac," and "bigfloat" for the integers, ratio-method-bignums,
> and floating-point-method-bignums. I like changing around terms
> in the middle of a discussion, don't you? :-)

Not a problem! In my heart, they're bignums, but for this dicussion
and to avoid confusion, your terminology is fine. Since my design
is not quite floating point--and certainly isn't ratios--do you have
a term for my design? (-:

>> I guess I'm still not really clear--given the need to be able to
>> represent *all* digits of a number in the mantissa--on exactly
>> what an exponent gives me.
>
> As you gradually develop in your message, we are basically talking
> about the same thing. The 'exponent' is very closely related to
> what you call "dp" -- both are numbers that control the "scale" of
> the bigfloat's value. A linear change in 'exponent' corresponds to
> a geometric change in the bigfloat's value.

Yep. The difference here is that changing my "DP" moves the *decimal*
point around. E.g., 3.14159 {dp=5}. If I change DP to 3, the bit
string (1001100101100101111) now stands for 314.159.

The binary point/decimal point confusion comes from the disconnect
between having a DP represent the *decimal* point in a *binary*
representation of the (integer) decimal digit string. Internally,
the DP is used to align numbers for addition (et alii) or to calculate
the result DP of multiplication and division.

> Elsethread, Paul derisively commented that you seemed to be wanting
> to represent 2^10000, digit-by-digit, in 125K of memory; and you
> didn't quite seem to "get it." 125K is much, much too small to
> hold all the digits of 2^10000 *exactly*...

No, I didn't quite understand his question. But then he appears
to not have understood my design! (-:

> -- you'd need over (10^100)^10 kilobytes of memory to hold the
> whole number *exactly*, digit-by-digit! That's why the 'exponent'
> is so important -- it allows you to represent 2^10000 as
>
> (1.0) * (2^k)^(10000/k)
>
> in base 2^k, taking even less memory than it takes to store that
> line of text. Note that there's been NO loss of precision here;
> we're merely compressing a lot of excess zeroes into "exponent."
> And it can go the other way, so that we can represent 2^-10000
> just as efficiently.

THE PROBLEM IS... what happens when I add one to that? The answer
I want is:

19950631168807583848837421626835850838234968318861924548520089498529
43883022194663191996168403619459789933112942320912427155649134941378
11175937859320963239578557300467937945267652465512660598955205500869
18193311542508608460618104685509074866089624888090489894838009253941
63325785062156830947390255691238806522509664387444104675987162698545
32228685381616943157756296407628368807607322285350916414761839563814
58969463899410840960536267821064621427333394036525565649530603142680
23496940033593431665145929777327966577560617258203140799419817960737
82456837622800373028854872519008344645814546505579296014148339216157
34588139257095379769119277800826957735674444123062018757836325502728
32378927071037380286639303142813324140162419567169057406141965434232
46388012488561473052074319922596117962501309928602417083408076059323
20161268492288496255841312844061536738951487114256315111089745514203
31382020293164095759646475601040584584156607204496286701651506192063
10041864222759086709005746064178569519114560550682512504060075198422
61898059237118054444788072906395242548339221982707404473162376760846
61303377870603980341319713349365462270056316993745550824178097281098
32913144035718775247685098572769379264332215993998768866608083688378
38027643282775172273657572744784112294389733810861607423253291974813
12019760417828196569747589816453125843413595986278413012818540628347
66490886905210475808826158239619857701224070443305830758690393196046
03404973156583208672105913300903752823415539745394397715257455290510
21231094732161075347482574077527398634829849834075693795564663862187
45694992790165721037013644331358172143117913982229838458473344402709
64182851005072927748364550578634501100852987812389473928699540834346
15880704395911898581514577917714361969872813145948378320208147498217
18580113890712282509058268174362205774759214176537156877256149045829
04992461028630081535583308130101987675856234343538955409175623400844
88752616264356864883351946372037729324009445624692325435040067802727
38377553764067268986362410374914109667185570507590981002467898801782
71925953381282421954028302759408448955014676668389697996886241636313
37639390337345580140763674187771105538422573949911018646821969658165
14851304942223699477147630691554682176828762003627772577237813653316
11196811280792669481887201298643660768551639860534602297871557517947
38524636944692308789426594821700805112032236549628816903573912136833
83935917564187338505109702716139154395909915981546544173363116569360
31122249937969999226781732358023111862644575299135758175008199839236
28461524988108896023224436217377161808635701546848405862232979285387
56234865564405369626220189635710288123615675125433383032700290976686
50568557157505516727518899194129711337690149916181315171544007728650
57318955745092033018530484711381831540732405331903846208403642176370
39115506397890007428536721962809034779745333204683687958685802379522
18629120080742819551317948157624448298518461509704888027274721574688
13159475040973211508049819045580341682694978714131606321068639151168
1774304792596709377

> There's really no reason for 'exponent' to be a bigint unless
> you're a supergigamegacosmologist...

[grin] But I'm not convinced there's a reason for an exponent AT ALL!

> <about choice of base>
>> The reason is simple. No way to accurately convert 1.4 to a
>> binary point number. Using a decimal point, it's trivial.
>
> That's reasonable. But if you really want useful user-input
> things, I think bigfracs may be the way to go. They can handle
> 1.4 as if the user had entered 14/10, and can get 1/3 exactly
> too.

Yes, the ratio method is definitely one I need to consider!

I'm becoming ever more convinced the "IEEE FP" method (using an
exponent) does not meet my requirements.

>> It seems that, in my mind, "exponent" is bound to the idea that
>> the "mantissa" is truncated (sometimes), and this I seek to
>> avoid.
>
> Well, lose that preconception, then.

Okay. But I still haven't heard an answer to the question: if
my goal is to maintain *ALL* digits, what does an exponent buy me?
My sense is: nothing.

>> The **only** acceptable loss of precision occurs during a
>> division when the result exceeds a user-defined limit.
>
> Divisions such as 7/5, you mean?

No, such as 7/3. (-:

> I guess you mean any division where the divisor isn't a power
> of factors of 'Base'; in which case, couldn't you consider
> using a base of 60 or 5040, say, instead of that factor-poor
> 10 or 1000 you're using now?

Doesn't that only shift the problem around and *reduce* it?
For my money, that isn't an improvement.

> Note that 1.4 decimal is (1).(24) base 60, and 1/3 is (0).(20)
> base 60. You get the best of both worlds... until the user wants
> to compute 1/7, that is.

Exactly. Not an option.

> So, to recap my main points and questions: Your "dp" is closely
> related to the common term 'exponent.'

Yes, close, but not quite exactly so.

> What base are you using, *really*?

Internal operations are done in binary.

> Consider using bigfracs, and if that's still not attractive
> (this being your pet, after all), consider using a different
> base.

I'll be seriously considering bigfracs.
Alternate bases, not an option (AFAIKT).

-- 
|_ CJSonnack <Chris@Sonnack.com> _____________| How's my programming? |
|_ http://www.Sonnack.com/ ___________________| Call: 1-800-DEV-NULL  |
|_____________________________________________|_______________________|


Relevant Pages

  • Re: Exponential
    ... > In this code the exponential part using scientific notation becomes ... In C++ %e causes output of as many digits in the exponent as ... conversion to, say, a string, and then output that string. ...
    (comp.lang.cpp)
  • Re: How to convert extra long strings into their equivalent Hex Strings in VBA (Word 2K)
    ... numbers (upto 18 digits max) into its equivalent Hex String ... Public Function ExpressServiceCode(ByVal ServiceTag As String) As String ... 'the number dblTemp in the specified base, ... Dim lngTemp As Long ...
    (microsoft.public.vb.general.discussion)
  • Patterns in pi, copyright law, and philosophy
    ... Whether the offset of a string found in pi can be used as a form on ... then calculate how many digits of pi one would need to raise the ... "An infinite series of numbers is not an exhaustive set of numbers. ... finite string of digits occurs within the decimal expansion of pi, ...
    (sci.math)
  • Re: Patterns in pi, copyright law, and philosophy
    ... The digits of pi are widely believed to be "normal", in the sense that every n digit combination of digits is equally likely, and pass all reasonable tests of randomness. ... The mean length of the offset must at least be equal or greater than the length of the string you are looking for. ... Pi has an infinite representation as a decimal. ... finite string of digits occurs within the decimal expansion of pi, ...
    (sci.math)
  • Re: Required Field for 7 Numeric digits only
    ... Function IsNumber(ByVal Value As String) As Boolean ... > works with verifying that it has 7 digits and is a numeric filled> textbox ... > Private Sub TextBox1_KeyPress ... > Dim IsValid As Boolean ...
    (microsoft.public.excel.programming)