Re: Rounding errors
From: Robert Wagner (robert_at_wagner.net.yourmammaharvests)
Date: 08/27/04
- Next message: Robert Wagner: "Re: MaxDB = SAPDB = ADABAS (was: Sorts (revised)"
- Previous message: Robert Wagner: "Re: Rounding errors"
- In reply to: Richard: "Re: Rounding errors"
- Next in thread: Richard: "Re: Rounding errors"
- Reply: Richard: "Re: Rounding errors"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 26 Aug 2004 22:58:19 GMT
On 26 Aug 2004 01:35:12 -0700, riplin@Azonic.co.nz (Richard) wrote:
>Robert Wagner <robert@wagner.net.yourmammaharvests> wrote
>> >You then truncated that to 3 digits and lost and average of .0005 per
>> >value reducing the average to the .4995 that the v999 numbers now
>> >actually add up to form.
>>
>> Alternatively,if I eliminated zeros (post-truncation), the average of
>> the other 99.9% would be .500.
>
>.000 is still a valid result from a random set of 3 digit numbers even
>if 0.0000000000000... is not.
>
>You are just manipulating the results to attempt to disguise the error
>you made.
Second attempt. I see the answer now. Change the pic to 9v999 and the
set size to 1001. Values start with .000 and end with .999, 1.000. The
average is 500.5/1001 = .500.
Why add 1.000? Because the output of a random number generator must be
rounded and, as you pointed out, the result can be 1.000.
>Take for example 10 random single digit numbers. If there was an even
>distribution (o - 9) the average is 0.45 (try it).
Again, eleven values starting with .0 and ending with 1.0 average
5.5/11= .5
>However, on what basis can you claim that you can dispense with one of
>the 10 digits in a random set ? It may give a 'better' result for
>your claim, but only if you ignore the consequences.
You're right, deleting zero is a mistake.
>> If PERFORM VARYING Frac FROM .001 BY .001 UNTIL Frac > .999,
>> the answer is 499.5/999 = .500.
Ok, I restored .000 and added 1.000. The result is the same.
Here is a more commonplace demo program. It takes familiar dollars and
cents and rounds them to dollars. Now the error is 5,000 parts per
million (dollars).
identification division.
program-id. test27b.
*> author. Robert Wagner.
*> Test rounding error
*> Results: 5000.0000 5000.0050
data division.
working-storage section.
01 unqualified-variables.
05 Amt comp pic 99999v99.
05 AmtRounded comp pic 99999.
05 AmtTotal-1 value zero comp pic 9(14)v99.
05 AmtTotal-2 value zero comp pic 9(14)v99.
05 AmtCount value zero comp pic 9(14).
05 AmtAverage-1 pic zzzzzz.9999.
05 AmtAverage-2 pic zzzzzz.9999.
procedure division.
main.
PERFORM VARYING Amt FROM ZERO BY .01 UNTIL Amt > 10000
COMPUTE AmtRounded ROUNDED = Amt
ADD Amt TO AmtTotal-1
ADD AmtRounded TO AmtTotal-2
ADD 1 TO AmtCount
END-PERFORM
COMPUTE AmtAverage-1 rounded = AmtTotal-1 / AmtCount
COMPUTE AmtAverage-2 rounded = AmtTotal-2 / AmtCount
DISPLAY AmtAverage-1 AmtAverage-2.
>There is no 'error' in rounding.
Yes, there IS an error in rounding. That's the point I'm making here.
>There is only an error in what you expect to happen.
I expect the average to remain the same. If it changes, rounding is
changing the numbers.
> Rounding is designed to take a set of arbitary
>precision numbers and to give an accurate representation in a limited
>precision.
>
>ie if we have a set of random numbers in the form 9v9(lots) then the
>average will indeed be 0.5000 even if it includes 0.000000001.
>
>When that set is rounded to 2 digits of precision it only needs to
>look at the 3rd digit. The result will have exactly the same average
>as the original set at 0.50000, even if it included numbers that
>started 0.000.
That's how it should work. Rounding with Cobol raises the average to
.5005, as I demonstrated.
>Because only 3 digits are needed to do the rounding it is actually
>only necessary to store these 3, the rest of the digits can be
>discarded.
Right, after rounding to 3 digits.
>However, the discarding of these digits from the 4th onwards loses the
>value of these. The average is, as I have shown, 0.4995. This
>doesn't matter to the rounding because the loss is of unrequired data.
No, if the random numbers are rounded to 3 digits, the average will be
.500. We need a sample size of at least 1001 because the values after
rounding can range from .000 to 1.000.
>Your error is that you expected the truncated 3 digit numbers to add
>up to the same as the original random numbers. And then you expected
>the rounded two digit numbers to have the same characteristics (total
>average) as the 3 digit numbers.
>
>Rounding is to recover the characteristics of the original large
>precision while working in limited precision.
>
>Your set of 3 digit numbers represents a loss of data which the
>rounding recovers. Come back when you can understand this.
This talk about random numbers is a straw man. The topic is the error
in Cobol rounding.
Readers can easily test this. Find a large file with currency amounts.
Do a sum and compute the rounded average out to four places. Now round
the numbers to whole units and compute their average. You will find
that it increased by .5%. If numbers are computed rather than read
from a file, they the error will be less. See demo below.
How much is your company's payroll? If the company has 10,000
employees, its payroll will be around $500M or 50B pennies. By
rounding each paycheck to the 'nearest' penny (not), the company might
be overpaying by $65,000.
Fix rounding to work right and you'll be a hero. Alternatively, put
the difference into your own paycheck and management will not see a
change in the total. Try to get a cell with internet access so we can
congratulate you for fulfilling an urban myth.
>One more time. Rounding is not designed to take a set of fixed
>precision numbers and reproduce the characteristics in a lesser
>precision. Your expectation is flawed.
>
>If you were to do this with 0.000001 BY 0.000001 UNTIL > .999999 then
>you will get a much closer result
Rounding six digits to five and six digits to two produced the same
result:
.500000000 .500000500
As before, 5 was added one position to the right of digits removed by
rounding.
Aha! Now I see the value of Standard Intermediate Data Item. With its
32-digit precision (average 16 right), rounding errors move from 4th
right of decimal to the 17th.
>For random numbers correctly in the range 0.0000...1 to 0.9999....
>
> large precision set -> average 0.500..
> rounded to 2 digits -> average 0.500..
> rounded to 1 digit -> average 0.500..
> rounded to 9 digits -> average 0.500..
> rounded to 0 digits -> average 0.500.. (half round to 1.0)
To see the error, your average must have one digit more than the
random numbers you're rounding.
When we're computing, how do we know the size of the intermediate
being rounded? I wrote this simple simulation of a payroll calculation
to measure that.
identification division.
program-id. test27c.
*> author. Robert Wagner.
*> Test rounding error on payroll calculation
*> Step interval is a prime to eliminate repeats
*> Results: 1705.4328000000 1705.4329297429
data division.
working-storage section.
01 unqualified-variables.
05 Amt comp pic 99999v9(09).
05 AmtRounded comp pic 99999v99.
05 AmtTotal-1 value zero comp pic 9(09)v9(09).
05 AmtTotal-2 value zero comp pic 9(09)v9(09).
05 AmtCount value zero comp pic 9(10).
05 Hours comp pic 9(03)v99.
05 Rate comp pic 9(03)v99.
05 StopOpt value zero comp pic 9(03)v99.
05 AmtAverage-1 pic zzzzzz.9(10).
05 AmtAverage-2 pic zzzzzz.9(10).
procedure division.
main.
PERFORM VARYING Hours from 20 by .23 until Hours > 79
AFTER Rate from 10 by .23 until Rate > 59
COMPUTE Amt ROUNDED = Hours * Rate
ADD StopOpt to Hours
COMPUTE AmtRounded ROUNDED = Hours * Rate
ADD Amt TO AmtTotal-1
ADD AmtRounded TO AmtTotal-2
ADD 1 TO AmtCount
END-PERFORM
COMPUTE AmtAverage-1 rounded = AmtTotal-1 / AmtCount
COMPUTE AmtAverage-2 rounded = AmtTotal-2 / AmtCount
DISPLAY AmtAverage-1 AmtAverage-2.
The average paycheck is off by 130 parts per million. The compiler
(Realia) is doing a pretty good job managing the intermediate.
>So how does this affect Cobol programs ?
>
>If it is required to some divisions then the result will be an
>arbitrary number of fractional digits. If the result has to be
>reduced to 2 digits as being, say, cents, then it is only necessary to
>store 3 digits in order to determine how it should be rounded.
>
>Your assumption is that a million of those those 3 digits should
>represent the characteristics of the full precision results. THEY
>DON'T. The rounded 2 digit results _DO_ represent the full precision
>results and it doesn't matter that the 3 digit truncations do not.
>
>That is: the rounding recovers the _correct_ result that the 3 digit
>truncation does not represent.
>
>You are correct that the 3 digit set and the 2 digit rounded set are
>different, but it is the rounded set that is correct.
If you had rounded the 3-digit set, it would have the same average as
the original numbers. Now, when you round three digits to two, how
does Cobol know whether it should 'recover' the original average or
leave it unchanged? It doesn't.
You're confusing an outright error in the rounding algorithm with
miraculous 'recovery' of missing data.
>Then don't use it for that application. It is a tool, if it doesn't
>fit then it is the wrong tool for _that_ job. It may be the correct
>tool for some other job.
When you publish data publically, you man not be aware of how it will
be used.
>> Every time they say ROUNDED they're creating an error of 500 parts per
>> million. They've been doing it for 45 years.
>
>You still don't understand that it is the truncation to 3 digits that
>is out by 500 parts per million.
The demos I posted above don't do any truncation, yet the average
after rounding is higher than before rounding.
Truncation is a straw man. You set him up and then easily knock him
down.
- Next message: Robert Wagner: "Re: MaxDB = SAPDB = ADABAS (was: Sorts (revised)"
- Previous message: Robert Wagner: "Re: Rounding errors"
- In reply to: Richard: "Re: Rounding errors"
- Next in thread: Richard: "Re: Rounding errors"
- Reply: Richard: "Re: Rounding errors"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|