Re: Rounding errors

From: Richard (riplin_at_Azonic.co.nz)
Date: 08/26/04


Date: 26 Aug 2004 01:35:12 -0700

Robert Wagner <robert@wagner.net.yourmammaharvests> wrote

> The spoiler is the inclusion of zero. If we take 999 numbers between
> .001 and .999, the average is .500. See below.
 
> >The mistake you made is that if you have a random distribution of
> >fractions with an arbitrary precision then the average will indeed be
> >.500000 or close to it.
>
> That's because a fraction will not produce zero. Zero isn't a rational
> number, it's a limit. As x approaches infinity, 1/x approaches zero.

Whatever. You will still get numbers that are less than .001 that when
truncated to 3 digits will be .000.
     
> >You then truncated that to 3 digits and lost and average of .0005 per
> >value reducing the average to the .4995 that the v999 numbers now
> >actually add up to form.
>
> Alternatively,if I eliminated zeros (post-truncation), the average of
> the other 99.9% would be .500.

.000 is still a valid result from a random set of 3 digit numbers even
if 0.0000000000000... is not.

You are just manipulating the results to attempt to disguise the error
you made.
 
Take for example 10 random single digit numbers. If there was an even
distribution (o - 9) the average is 0.45 (try it). Now you can
contrive to say that the nine numbers 1-9 average to 0.50.

Howver, on what basis can you claim that you can dispense with one of
the 10 digits in a random set ? It may give a 'better' result for
your claim, but only if you ignore the consequences.

          .1 .2 .3 .4 .5 .6 .7 .8 .9 total 4.5 average .5
rounded 0 0 0 0 1 1 1 1 1 total 5 average 0.555..

Howver, if you take a set of large precision random numbers between
.10000 and .99999.. the average will be 0.55.. not 0.5000.
           

> Assume my test set contained no zeros, the sum was 500,000, the
> average was .500. After rounding was applied, the sum became 500,500.
> This demonstrates that rounding introduced an error, an upward bias.

No. It demonstrates that you don't understand random numbers. If you
eliminate all the 3 digit zeros (that derive from the random numbers
0.00000..1 to 0.00099999..) then you would have replaced them with
some other numbers. If this was done in a random way then the total
will be 500,000 as you say (up from 499,500). The total of the full
arbitrary length untruncated actual random numbers, though, will be
the 500,000 plus 0.1% for the 0.1% of the numbers you have replaced ie
500,500.

The rounding to 2 digits will restore the full count 500,500.

> Suppose I added a million zeros to that set. The sum would still be
> 500,000 but the average would be .250.

That is because the set is getting larger by that count. Above, you
are not changing the count (ie 1 million) but are changing some values
from .000 to something else and then _NOT_ adding the new values to
the total.

> If PERFORM VARYING Frac FROM .001 BY .001 UNTIL Frac > .999,
> the answer is 499.5/999 = .500.

Well, sure and if you leave off the .999 because you don't like that
one you will get another total, again.

 
> Thank you for providing a framework that demonstrates the error in
> Cobol rounding without the issue of truncating random numbers.

There is no 'error' in rounding. There is only an error in what you
expect to happen. Rounding is designed to take a set of arbitary
precision numbers and to give an accurate representation in a limited
precision.

ie if we have a set of random numbers in the form 9v9(lots) then the
average will indeed be 0.5000 even if it includes 0.000000001.

When that set is rounded to 2 digits of precision it only needs to
look at the 3rd digit. The result will have exactly the same average
as the original set at 0.50000, even if it included numbers that
started 0.000.

Because only 3 digits are needed to do the rounding it is actually
only necessary to store these 3, the rest of the digits can be
discarded.

However, the discarding of these digits from the 4th onwards loses the
value of these. The average is, as I have shown, 0.4995. This
doesn't matter to the rounding because the loss is of unrequired data.

Your error is that you expected the truncated 3 digit numbers to add
up to the same as the original random numbers. And then you expected
the rounded two digit numbers to have the same characteristics (total
average) as the 3 digit numbers.

Rounding is to recover the characteristics of the original large
precision while working in limited precision.

Your set of 3 digit numbers represents a loss of data which the
rounding recovers. Come back when you can understand this.

> This program rounds an evenly distributed set of numbers and displays
> the average with and without rounding. The upward bias caused by
> rounding is as predicted.
>
> identification division.
> program-id. test27.
> *> author. Robert Wagner.
> *> Test rounding error
> *> To insure the same number of rounds up and down, 499 each,
> *> do not round .999.
> *> Findings: .5000 .5004
> data division.
> working-storage section.
> 01 unqualified-variables.
> 05 Frac pic 9v999.
> 05 FracRounded pic 9v99.
> 05 FracTotal-1 value zero pic 99999v999.
> 05 FracTotal-2 value zero pic 99999v999.
> 05 FracCount value zero pic 9999.
> 05 FracAverage-1 pic zz.9999.
> 05 FracAverage-2 pic zz.9999.
>
> procedure division.
> main.
> PERFORM VARYING Frac FROM .001 BY 0.001 UNTIL Frac > .999
> COMPUTE FracRounded ROUNDED = Frac
> ADD Frac TO FracTotal-1
> IF Frac = .999
> ADD Frac TO FracTotal-2
> ELSE
> ADD FracRounded TO FracTotal-2
> END-IF
> ADD 1 TO FracCount
> END-PERFORM
> COMPUTE FracAverage-1 = FracTotal-1 / FracCount
> COMPUTE FracAverage-2 = FracTotal-2 / FracCount
> DISPLAY FracAverage-1 FracAverage-2.
>
> Result: .5000 .5004

One more time. Rounding is not designed to take a set of fixed
precision numbers and reproduce the characteristics in a lesser
precision. Your expectation is flawed.

If you were to do this with 0.000001 BY 0.000001 UNTIL > .999999 then
you will get a much closer result

For random numbers correctly in the range 0.0000...1 to 0.9999....

      large precision set -> average 0.500..
      rounded to 2 digits -> average 0.500..
      rounded to 1 digit -> average 0.500..
      rounded to 9 digits -> average 0.500..
      rounded to 0 digits -> average 0.500.. (half round to 1.0)

      truncated to 3 digits -> average 0.4995 (as demonstrated)
      truncated to 2 digits -> average 0.495
      truncated to 1 digit -> average 0.45

So now you can complain that round is 'out' by 500, or 5000, or 50000
on the same data.

But it is not the _roundeing_ that is wrong, it is the truncation to
3, or 2 or 1 digit. That truncated set no longer represents the total
value of the random numbers, but the rounded set does.

So how does this affect Cobol programs ?

If it is required to some divisions then the result will be an
arbitrary number of fractional digits. If the result has to be
reduced to 2 digits as being, say, cents, then it is only necessary to
store 3 digits in order to determine how it should be rounded.

Your assumption is that a million of those those 3 digits should
represent the characteristics of the full precision results. THEY
DON'T. The rounded 2 digit results _DO_ represent the full precision
results and it doesn't matter that the 3 digit truncations do not.

That is: the rounding recovers the _correct_ result that the 3 digit
truncation does not represent.

You are correct that the 3 digit set and the 2 digit rounded set are
different, but it is the rounded set that is correct.

> I understand. You are propogating rounding backward into detail so
> that the details sum to the (rounded) total. You can still be off by 1
> .. unless you add the last difference into the total.

Noi. Wrong. the mechanism cannot be off by 1, it will only be off by
+-.5 or less and this will be _exactly_ the same as the rounding error
when rounding the total.

> In the financial data warehouse industry, we deal with this all the
> time -- reports that don't quite add up to the total. The worst-case
> error is plus or minus the number of detail lines. It's not a big
> deal. Our numbers are in thousands rather than currency units. For
> Brazil, they're in millions.
>
> Your system would screw us up. On one report you might say you are
> holding 1,000 (thousands of Euros) in XYZ. On the next report the same
> holding might be reported as 1,001. We would think you bought one.
> Some analytical methods are based on the number of 'decisions'. They
> don't care about quantity bought or sold. Your rounding correction
> would falsely count as a decision.

Then don't use it for that application. It is a tool, if it doesn't
fit then it is the wrong tool for _that_ job. It may be the correct
tool for some other job.
 
> Moreover, we measure turnover rate in a portfolio -- the number of
> changes since the previous report. Your corrections would make it look
> like you are 'churning' i.e. trading more than your peers. That would
> make you look bad to potential investors.

It wasn't compulsory. Unlike you I usually don't claim something is
'best practice' when it may be inappropriate.
 

> No system of numeration can store all real numbers exactly. Integers
> cannot represent pi, the square root of 2 and other irrational
> numbers. Nor can they express many rational fractions as a single
> number, for example 1/3.
 
> This discussion is about rounding.

Then discuss rounding in floating-point as a mechanism, for eaxample:

          if ( round(a, 2) == 1.00 )

rather than confusing it with precision, which is what you did.

> The function of rounding is to
> create a LESS precise representation.

Less precise, but, on average, as accurate as the original,and more
accurate than some truncated version of the data.

> Every time they say ROUNDED they're creating an error of 500 parts per
> million. They've been doing it for 45 years.

You still don't understand that it is the truncation to 3 digits that
is out by 500 parts per million.

> I have no problem with being found in error. Nor even name-calling
> when I deserve it. What I object to is criticism when my conslusion
> is, in fact, correct .. which it is in this case.

No. Wrong. Your conclusion that rounding gives the wrong result is
incorrect. Certainly your observation that the rounding is not the
same as the truncated 3 digit set is correct. Your conclusion as to
which of these is wrong and why is flawed.

> I apologize for those remarks.

Thank you.



Relevant Pages

  • Re: Rounding errors
    ... No. Round to 3 then rounding to 2 is the beginner's mistake that gives ... What you consistently fail to notice is why the average of the 3 digit ... precision set it increases the average of that to 0.5005. ...
    (comp.lang.cobol)
  • Re: Stop Rounding Last Digit of 16 Digit Numbers
    ... XL's precision is 15 decimal digits, so if you need to do math on it, ... > rounding the last digit to a zero. ...
    (microsoft.public.excel.misc)
  • Re: Rounding v Truncation, was: Re: Platform Support vs.
    ... >> Is that machine going to see the 6, and then round each digit as it moves ... Whenever you are rounding, you must indicate to what precision. ...
    (comp.os.vms)
  • Re: Rounding errors
    ... >There is no 'error' in rounding. ... >precision numbers and to give an accurate representation in a limited ... >When that set is rounded to 2 digits of precision it only needs to ... >truncation does not represent. ...
    (comp.lang.cobol)
  • Re: Query value between min-max and return another value
    ... point data type, that's its purpose. ... characters (same effect as truncation). ... DOUBLE PRECISION does not exhibit banker's rounding. ...
    (microsoft.public.access.queries)