Re: Rounding errors

From: Richard (riplin_at_Azonic.co.nz)
Date: 08/25/04


Date: 25 Aug 2004 00:59:08 -0700

Robert Wagner <robert@wagner.net.yourmammaharvests> wrote

>>> Intuition tells us half the numbers will round up and half will
>>> round down. Given a very large sample, the average will remain
.500 ..
>>> we think.
>
> >It may be what you had thought, but why do you assume that everyone
> >else thought that ?
>
> Because it's in the Cobol standard.

No. Wrong. Your comment was 'the average will remain .500.. we
think'.

I don't think that because the average of your example wasn't .500 in
the first place. You created a fantasy situation.

> >> Suppose we have a million random numbers formatted v999, adding up to
> >> 500,000.

It is fantasy because if there is an even distribution of all possile
values in a v999 it won't add up to 500,000. If it does add up to
that it isn't random.

If the random numbers were long enough then the total would be 500,000
(or close).

> >> Let's divide them into three groups: one containing rightmost
> >> digit of zero, a second containing 1-4 and a third containing 5-9.
> >> Let's round each group the Cobol way and sum the rounded numbers.
> >>
> >> Digit Population Sum
> >> 0 100,000 50,000
> >> 1-4 400,000 200,000 - 1,000 (-.0025 * 400,000)
> >> 5-9 500,000 250,000 + 1,500 (+.0003 * 500,000)
> >> Total 500,500
> >
> >Your methodology is flawed. In fact you have made a gross statistical
> >error.
> >
> >The fault is that you have claimed they were 'random' when you have
> >contrived to truncate after the third digit. If they had not been
> >truncated and you had left the remaining additional digits in, say, a
> >v99999999 and then added them up you would have got very close to the
> >total of 500,500.
 
> You're all wet. It is intuitively obvious that collections of random
> numbers formatted v99, v999 or v9(infinity) will average .50000.

That may be 'intuitively obvious' to _you_, but it is 'intuitively
obvious to us that are numerate' that it is quite wrong.

If you have an even distribution of the numbers 0.000 to 0.999 with 3
digits then the average is _NOT_ 0.500. It is in fact 0.4995.

This is very easy to prove. Just add up the 1000 numbers from 000 to
999 and the total is 499500.

A short cut to this is that there are 500 pairs: the first is 000 +
999 -> 999, the second is 001 + 998 -> 999, ... the last pair is 499 +
500 -> 999.

   500 * 999 -> 499500

Similarly the average of:
    v9 0 - 9 is 0.45
    v99 00 - 99 is 0.495
    v999 000 - 999 is 0.4995
    v9999 0000 - 9999 is 0.49995

Now it is true that given an arbitrary number it will approach 0.50000
very closely.
         
The mistake you made is that if you have a random distribution of
fractions with an arbitrary precision then the average will indeed be
.500000 or close to it.

You then truncated that to 3 digits and lost and average of .0005 per
value reducing the average to the .4995 that the v999 numbers now
actually add up to form.

Rounding of the original set of long numbers, or the truncated set as
it only requires the third digit to do rounding, restores the average
back to .5000.

> >So the flaw is not that the rounding at the third digit increased the
> >rounded total, but your truncation of the 4th and later digits
> >decreased the original total.
>
> You are wrong. Support this with an example.

To show that an even distribution of all fractions with 3 digits does
not average .5 :

        MOVE ZERO TO FracTotal
        MOVE ZERO TO FracCount
        PERFORM VARYING Frac FROM 0.000 BY 0.001 UNTIL Frac > 0.999
             ADD Frac TO FracTotal
             ADD 1 TO FracCount
        END-PERFORM
        COMPUTE FracAverage = FracTotal / FracCount
        DISPLAY FracAverage

        -> 0.4995

If you had started with a realistic set of random numbers between
0.000000000 and 0.999999999999999999999 then the average would indeed
be close to 0.5. Truncating after the 3rd digit would indeed affect
the total of all these numbers by an average of 0.0005 per number.
The result of the truncation should result in an even distribution of
the 3 digit values between 0.000 and 0.999 which _provably_ averages
to 0.4995 - exactly matching the loss of data.

When you round the numbers to two digits what existed beyond the 3rd
digit is irrelevant, so the truncation doesn't matter. The rounding
will recover the original average of .50.

> >> Here's another way of looking at what we did. We discarded the
> >> rightmost digit, producing numbers that look like v99. Then we left
> >> half of them unchanged and added .01 to the other half. By doing so,
> >> we increased the total by (.01 * 500,000) = 500.
>
> The rounded answer is incorrect.

No. The rounded answer is correct for a set of random numbers of
arbitrary length. The 3 digit truncated set of numbers doesn't
represent that set accurately, thus there are 3 averages:

  arbitrary length random -> .500000
  truncated 3 digit (your set) -> .4995
  rounded 2 digit set -> .500000

 
> There are two types of ignorance -- simple and volitional. The former
> simply doesn't know; the latter doesn't WANT to know and becomes
> hostile when you attempt to educate him. Most of us have encountered
> the latter type in our daily lives.

As we in fact often encounter this exactly in your messages.

> >I have done systems that carry the rounding forward. That is when the
> >first number is rounded the difference is added to the next number
> >before that is rounded (or truncated, as preferred). This ensures
> >that the total is always correct rather than being randomly incorrect
> >by a small amount.
>
> Rounding intermediate results is THE classic beginner's mistake. It
> doesn't surprise me that you advocate it.

You obviously didn't read or didn't understand the mechanism I used
which does _not_ round intermediate results.

Given a set of dollar and cent values that add up to a certain total
it may be required to show this as dollars only (for example for total
sales by branch). If the values are truncated to dollars they won't
add up to the total. If they are rounded individually they may, or
may not, add up to the rounded exact total.

The way to correct this is to round each value and take the difference
between that and the pre-rounded value and add that to the next number
before rounding that.

There is _no_ 'rounding intermediate result', there is _no_
advocating. Your criticism is based on not reading my message.

 
> The right way is to carry intermediate results to say six digits right
> of decimal and round them only when going to a report. The wrong way
> is to add rounded numbers into a total. I'm an autodidact but assume
> they used to teach this in Programming 101.

You are so sure that you are right that it never even occurs that you
might check your claims.

> >No. Wrong. "We" don't criticise floating-point for 'rounding errors'
> >at all. We criticise floating-point for not being able to represent
> >numbers exactly.
>
> Same thing. The error is in rounding a to .99999999 rather than 1.0.

No. Not the same thing at all. A binary floating point number cannot
represent 1 accurately. It isn't a 'rounding error' it is a matter of
precision. Rounding is the correction for this problem.

> Because I'm addressing more than one Cobol programmer.

I haven't seen any other Cobol programmer making these same errors.

> ---------------------------------------------------
> If someone else can show an error in my logic, without rancor, I'd be
> delighted to address his or her argument. Flammage offers neither
> information nor entertainment .. unless it's artful. The level of art
> in evidence here doesn't support the effort to respond.

There is no 'rancor', no flammage, in saying that you are wrong. You
obviously feel that you are being personally attacked by the mere
suggestion that you could ever make an error.

However, it was your claim that everyone else was wrong too, and that
we all made the assumptions that you did that started the rancor.

While I criticised your _methodolgy_ and said that your _conclusions_
were wrong. you responded with personal insults such as "You're all
wet", implied that I am ignorant by volition, and that I am a
beginner, and generally made ad hominem attacks.

> Succinctly, it's like 'pissing into the wind.'

I have noticed that attempting to educate you is exactly that.



Relevant Pages

  • Re: XQ and ->Qpi bug on large X
    ... you shouldn't be rounding to only two digits ... I am trying to make a similar point here about rounding; ... and round it, ... prior to using the input values in calculations. ...
    (comp.sys.hp48)
  • Re: Rounding errors
    ... >> Common sense says that if exactly half round up and half 'round down' ... You then seemed to conclude that the code that did the rounding was ... >And the amount that it is 'out' does not depend on how many digits are ... >In other words it is dependent on where the truncation occurs. ...
    (comp.lang.cobol)
  • Re: Rounding errors
    ... It maintains the average of the infinite precision original ... With enough digits the average is close enough to ... Rounding maintains this with an average of 0.500. ... >It only 'pushes it upwards' with respect to how much truncation ...
    (comp.lang.cobol)
  • Re: Decimall Float Question
    ... the number other than to round the decimal part. ... 'bankers rounding' to the nearest even number is not required). ... the FAQ code includes having a variable number of digits after ... // Convert number to string and split ... ...
    (comp.lang.javascript)
  • RE: Rounding in VBA - Any ideas?
    ... MS chose not to display more than 15 digits because digits beyond the 15th ... Consider dblContainer*lngExpon which you round to produce ... VBA did what you told it to do, but that is different than what you wanted ... "banker's rounding" than the VBA Round function, ...
    (microsoft.public.excel.worksheet.functions)