Re: Rounding errors

From: Robert Wagner (robert_at_wagner.net.yourmammaharvests)
Date: 08/27/04


Date: Thu, 26 Aug 2004 22:58:19 GMT

On 26 Aug 2004 01:35:12 -0700, riplin@Azonic.co.nz (Richard) wrote:

>Robert Wagner <robert@wagner.net.yourmammaharvests> wrote

>> >You then truncated that to 3 digits and lost and average of .0005 per
>> >value reducing the average to the .4995 that the v999 numbers now
>> >actually add up to form.
>>
>> Alternatively,if I eliminated zeros (post-truncation), the average of
>> the other 99.9% would be .500.
>
>.000 is still a valid result from a random set of 3 digit numbers even
>if 0.0000000000000... is not.
>
>You are just manipulating the results to attempt to disguise the error
>you made.

Second attempt. I see the answer now. Change the pic to 9v999 and the
set size to 1001. Values start with .000 and end with .999, 1.000. The
average is 500.5/1001 = .500.

Why add 1.000? Because the output of a random number generator must be
rounded and, as you pointed out, the result can be 1.000.
 
>Take for example 10 random single digit numbers. If there was an even
>distribution (o - 9) the average is 0.45 (try it).

Again, eleven values starting with .0 and ending with 1.0 average
5.5/11= .5

>However, on what basis can you claim that you can dispense with one of
>the 10 digits in a random set ? It may give a 'better' result for
>your claim, but only if you ignore the consequences.

You're right, deleting zero is a mistake.

>> If PERFORM VARYING Frac FROM .001 BY .001 UNTIL Frac > .999,
>> the answer is 499.5/999 = .500.

Ok, I restored .000 and added 1.000. The result is the same.

Here is a more commonplace demo program. It takes familiar dollars and
cents and rounds them to dollars. Now the error is 5,000 parts per
million (dollars).

 identification division.
 program-id. test27b.
*> author. Robert Wagner.
*> Test rounding error
*> Results: 5000.0000 5000.0050
 data division.
 working-storage section.
 01 unqualified-variables.
     05 Amt comp pic 99999v99.
     05 AmtRounded comp pic 99999.
     05 AmtTotal-1 value zero comp pic 9(14)v99.
     05 AmtTotal-2 value zero comp pic 9(14)v99.
     05 AmtCount value zero comp pic 9(14).
     05 AmtAverage-1 pic zzzzzz.9999.
     05 AmtAverage-2 pic zzzzzz.9999.

 procedure division.
 main.
     PERFORM VARYING Amt FROM ZERO BY .01 UNTIL Amt > 10000
          COMPUTE AmtRounded ROUNDED = Amt
          ADD Amt TO AmtTotal-1
          ADD AmtRounded TO AmtTotal-2
          ADD 1 TO AmtCount
     END-PERFORM
     COMPUTE AmtAverage-1 rounded = AmtTotal-1 / AmtCount
     COMPUTE AmtAverage-2 rounded = AmtTotal-2 / AmtCount
     DISPLAY AmtAverage-1 AmtAverage-2.
 
>There is no 'error' in rounding.

Yes, there IS an error in rounding. That's the point I'm making here.

>There is only an error in what you expect to happen.

I expect the average to remain the same. If it changes, rounding is
changing the numbers.

> Rounding is designed to take a set of arbitary
>precision numbers and to give an accurate representation in a limited
>precision.
>
>ie if we have a set of random numbers in the form 9v9(lots) then the
>average will indeed be 0.5000 even if it includes 0.000000001.
>
>When that set is rounded to 2 digits of precision it only needs to
>look at the 3rd digit. The result will have exactly the same average
>as the original set at 0.50000, even if it included numbers that
>started 0.000.

That's how it should work. Rounding with Cobol raises the average to
.5005, as I demonstrated.

>Because only 3 digits are needed to do the rounding it is actually
>only necessary to store these 3, the rest of the digits can be
>discarded.

Right, after rounding to 3 digits.

>However, the discarding of these digits from the 4th onwards loses the
>value of these. The average is, as I have shown, 0.4995. This
>doesn't matter to the rounding because the loss is of unrequired data.

No, if the random numbers are rounded to 3 digits, the average will be
.500. We need a sample size of at least 1001 because the values after
rounding can range from .000 to 1.000.

>Your error is that you expected the truncated 3 digit numbers to add
>up to the same as the original random numbers. And then you expected
>the rounded two digit numbers to have the same characteristics (total
>average) as the 3 digit numbers.
>
>Rounding is to recover the characteristics of the original large
>precision while working in limited precision.
>
>Your set of 3 digit numbers represents a loss of data which the
>rounding recovers. Come back when you can understand this.

This talk about random numbers is a straw man. The topic is the error
in Cobol rounding.

Readers can easily test this. Find a large file with currency amounts.
Do a sum and compute the rounded average out to four places. Now round
the numbers to whole units and compute their average. You will find
that it increased by .5%. If numbers are computed rather than read
from a file, they the error will be less. See demo below.

How much is your company's payroll? If the company has 10,000
employees, its payroll will be around $500M or 50B pennies. By
rounding each paycheck to the 'nearest' penny (not), the company might
be overpaying by $65,000.

Fix rounding to work right and you'll be a hero. Alternatively, put
the difference into your own paycheck and management will not see a
change in the total. Try to get a cell with internet access so we can
congratulate you for fulfilling an urban myth.

>One more time. Rounding is not designed to take a set of fixed
>precision numbers and reproduce the characteristics in a lesser
>precision. Your expectation is flawed.
>
>If you were to do this with 0.000001 BY 0.000001 UNTIL > .999999 then
>you will get a much closer result

Rounding six digits to five and six digits to two produced the same
result:
   .500000000 .500000500
As before, 5 was added one position to the right of digits removed by
rounding.

Aha! Now I see the value of Standard Intermediate Data Item. With its
32-digit precision (average 16 right), rounding errors move from 4th
right of decimal to the 17th.

>For random numbers correctly in the range 0.0000...1 to 0.9999....
>
> large precision set -> average 0.500..
> rounded to 2 digits -> average 0.500..
> rounded to 1 digit -> average 0.500..
> rounded to 9 digits -> average 0.500..
> rounded to 0 digits -> average 0.500.. (half round to 1.0)

To see the error, your average must have one digit more than the
random numbers you're rounding.

When we're computing, how do we know the size of the intermediate
being rounded? I wrote this simple simulation of a payroll calculation
to measure that.

 identification division.
 program-id. test27c.
*> author. Robert Wagner.
*> Test rounding error on payroll calculation
*> Step interval is a prime to eliminate repeats
*> Results: 1705.4328000000 1705.4329297429
 data division.
 working-storage section.
 01 unqualified-variables.
     05 Amt comp pic 99999v9(09).
     05 AmtRounded comp pic 99999v99.
     05 AmtTotal-1 value zero comp pic 9(09)v9(09).
     05 AmtTotal-2 value zero comp pic 9(09)v9(09).
     05 AmtCount value zero comp pic 9(10).
     05 Hours comp pic 9(03)v99.
     05 Rate comp pic 9(03)v99.
     05 StopOpt value zero comp pic 9(03)v99.
     05 AmtAverage-1 pic zzzzzz.9(10).
     05 AmtAverage-2 pic zzzzzz.9(10).

 procedure division.
 main.
     PERFORM VARYING Hours from 20 by .23 until Hours > 79
             AFTER Rate from 10 by .23 until Rate > 59
          COMPUTE Amt ROUNDED = Hours * Rate
          ADD StopOpt to Hours
          COMPUTE AmtRounded ROUNDED = Hours * Rate
          ADD Amt TO AmtTotal-1
          ADD AmtRounded TO AmtTotal-2
          ADD 1 TO AmtCount
     END-PERFORM
     COMPUTE AmtAverage-1 rounded = AmtTotal-1 / AmtCount
     COMPUTE AmtAverage-2 rounded = AmtTotal-2 / AmtCount
     DISPLAY AmtAverage-1 AmtAverage-2.

The average paycheck is off by 130 parts per million. The compiler
(Realia) is doing a pretty good job managing the intermediate.

>So how does this affect Cobol programs ?
>
>If it is required to some divisions then the result will be an
>arbitrary number of fractional digits. If the result has to be
>reduced to 2 digits as being, say, cents, then it is only necessary to
>store 3 digits in order to determine how it should be rounded.
>
>Your assumption is that a million of those those 3 digits should
>represent the characteristics of the full precision results. THEY
>DON'T. The rounded 2 digit results _DO_ represent the full precision
>results and it doesn't matter that the 3 digit truncations do not.
>
>That is: the rounding recovers the _correct_ result that the 3 digit
>truncation does not represent.
>
>You are correct that the 3 digit set and the 2 digit rounded set are
>different, but it is the rounded set that is correct.

If you had rounded the 3-digit set, it would have the same average as
the original numbers. Now, when you round three digits to two, how
does Cobol know whether it should 'recover' the original average or
leave it unchanged? It doesn't.

You're confusing an outright error in the rounding algorithm with
miraculous 'recovery' of missing data.

>Then don't use it for that application. It is a tool, if it doesn't
>fit then it is the wrong tool for _that_ job. It may be the correct
>tool for some other job.

When you publish data publically, you man not be aware of how it will
be used.

>> Every time they say ROUNDED they're creating an error of 500 parts per
>> million. They've been doing it for 45 years.
>
>You still don't understand that it is the truncation to 3 digits that
>is out by 500 parts per million.

The demos I posted above don't do any truncation, yet the average
after rounding is higher than before rounding.

Truncation is a straw man. You set him up and then easily knock him
down.



Relevant Pages

  • Re: Rounding errors
    ... digits then the average is _NOT_ 0.500. ... Rounding of the original set of long numbers, ... When you round the numbers to two digits what existed beyond the 3rd ... so the truncation doesn't matter. ...
    (comp.lang.cobol)
  • Re: How do you round off a float?
    ... o Floating point formats are almost always in binary, ... binary digits is very different from rounding to a certain number of decimal ... o A corollary to the above rule is use the maximum precision available on ...
    (comp.lang.cpp)
  • Re: Rounding errors
    ... It maintains the average of the infinite precision original ... With enough digits the average is close enough to ... Rounding maintains this with an average of 0.500. ... >It only 'pushes it upwards' with respect to how much truncation ...
    (comp.lang.cobol)
  • Re: XQ and ->Qpi bug on large X
    ... you shouldn't be rounding to only two digits ... I am trying to make a similar point here about rounding; ... and round it, ... prior to using the input values in calculations. ...
    (comp.sys.hp48)
  • Re: Rounding errors
    ... With enough digits the average is close enough to ... mathematics works on numbers that are assumed to have an infinite ... number of random digits following the point of rounding, ... What's the point of truncating to ...
    (comp.lang.cobol)