Re: Assembly Language - Mathematics WITHOUT maths coprocessor



On Tue, 11 Oct 2005 03:04:28 -0400, ¬a\/b <al@xxx> wrote:

this should be a/b =(a.sn.sn<<(32*precision))/b.sn; ?

Yes.

It's because when you store a number as a fixed point number, all that you are really doing is storing the number multiplied by a constant value. We usually make that constant 256 or 65536 or something of that sort, but it could just as well be 7, 69, 111, anything.

Now, that fourth grade math I was talking about...

When I was in fourth grade, we were taught how to do long division with decimal numbers. We were doing problems like this:
_____
1.2 ) 37


The way we were taught to do this was to convert 1.2 to an integer, by multiplying it by 10 enough times to do so. So for this particular problem, we would multiply it by 10 once to get 12. So that we came up with the correct answer, we also multiplied 37 by 10, so that the problem looked like this:

	   ______
	12 ) 370

Then it was just an ordinary integer division problem.

This is similar to what we're doing with fixed point numbers. We're multiplying them so that they become integers, and then preforming operations on integers instead.

Now in 4th grade, we multiplied both numbers in the problem by the same amount, because we wanted to get the correct answer when we were done. Doing the division would remove the constant that we multiplied each number by, so that the answer we got was the correct one.

For a division problem, "n / d", we basically turned it into "(10 * n) / (10 * d)" and of course the "10 / 10" part of that doesn't affect the answer since 10 / 10 is 1, and so the answer came out multiplied by 1.

Now in fixed point math we actually want the answer to also be multiplied by 10 (or whatever). So we've got to do the equivelant of "(10 * 10 * n) / (10 * d)" so that the 10 in the denominator eliminates one of the 10s in the numerator, leaving us with the other 10 that we want the answer to be multiplied by.

Now before you had written:
	a/b =(a.sn.sn<<(2*32*precision))/b.sn;

Which is basically:
	a/b =(a.sn.sn<<(32*precision)<<(32*precision))/b.sn;

A single "<<(32*precision)" is just like a "* 10"

The problem with this is that a.sn.sn (what's with the two 'sn' there anyway?) already has one "<<(32*precision)" applied to it when it was converted to the fixed point format. So even though we want the numerator to be "n *10 * 10", it's representation in the floating point format is already "* 10" so it only needs one more "* 10" to make it "* 10 * 10".

The same is true with the denominator, which also needs to be "* 10" for it to all work out, but it's already "* 10" because it's in fixed point format, so we don't do anything to it.

i think it is more easy than what you say
for input something like this:
if Lstring is the string of numbers and "a" is the number
result
fnum  a, b;
unum   c;
char  *p1, *p2;
b.sn=read(Lstring, p1);

a=b<<(32*precision);

if(*p1==".") ++p1;
else goto label;
c=read(p1, p2);

if(p1==p2) {label:;

           return a;
          }


if(precision<c.len)
        {k=c.len-precision;
         c>>= 32*k;
        }
a+=c;
return  a;

for output, something like above, should be possible

That certainly doesn't look any easier. It won't work either. (not that I claim to understand it, but what of it that I do understand, it won't work)


The problem is that it's just not that simple of a conversion. For example, 0.69 in binary is 0.1011000010100011, however 69 in binary is 1000101, and as you see, there's just no similarity in the bit patterns.

Now if you took the binary form of 69 and divided it by 100 (decimal), then you'd have the correct bit pattern. However, that's basically just doing what I said, except that it breaks the number into two halves and does each seperately. I fail to see how going through all of that trouble is any easier than the method I outlined.

if(precision<c.len)
        {k=c.len-precision;
         c>>= 32*k;
        }

You particularly want to account for the cases with lots of extra decimal places. For example, if I were doing 32-bit numbers with 8 fractional bits, I'd likely be reading the number into a 64-bit integer. The integer part is 24 bits, so that means I have 40 bits left over that I can toss extra decimal places into.


So say I get this number as input:
3.1415926535897932384626433832795028841971693993751

Those 40 bits can hold log_10(2^40) or 12.04119983 decimal digits. (Or if you wanted to do it the easy way, you'd just divide 40 by 3.321928095 and get the same number.) So what I'd do in my number reading routine is make it just always read 12 digits after the decimal point.

So if I gave it the above number, it would treat it as 3.141592653589, and if I gave it 3, it would treat it as 3.000000000000. So it works like this:

1.  let x = 0 and let c = 12
2.  Read next character.
3.  If character is a number, multiply x by 10 and then add that number.
4.  If character is a decimal point, go to step 10.
5.  If character is anything else, go to step 20.
6.  Go to step 2.

10.  Read next character
11.  If character is a number, multiply x by 10 and then add that number.
12.  If character is anything else, go to step 20.
13.  Subtract 1 from c.
14.  If c == 0, go to step 30.
15.  Go to step 10.

20.  Multiply x by 10.
21.  Subtract 1 from c.
22.  If c != 0, go to step 20.

30.  Shift x left by 8 bits (or whatever your "32*precision" is).
31.  Add 500000000000 (half of 10^12) to x.
32.  Divide x by 1000000000000 (which is 10^12)
33.  Rejoice that you now have the number in fixed point format.

So steps 1 through 22 convert 3.141592653589 to 0x2DB75839F15.

You may notice that the next digit in pi is a 7, but that I didn't provide for rounding up the 9. This is because so many of those digits aren't even going to make it into the final number that the error caused by not doing so is very very very small. Like 0.00000001% small. So it's just not worth the trouble.

Step 30 adds in the normal "<< 8" format of the numbers, which turns it into 0x2DB75839F1500. Then step 31 takes care of rounding. Then step 32 moves the decimal point 12 places left, turning the number into 0x324. That makes it 11.00100100 in binary, which in decimal is 3.140625, which is as close to pi as you're going to get with only 8 fractional bits. So for the most part, most of those 12 digits we looked at were unnecessary, however on extremely rare cases they actually do make a difference, for example 1.005859374 would become 0x101, but 1.005859376 would become 0x102. Not really worth making a fuss over, but as long as you've got those extra 40 bits and can input 12 digits, you might as well.
.




Relevant Pages

  • Re: multiplication
    ... As a preliminary step to multiplying, work out the algorithm to add ... numbers has at most n+1 digits. ... character array will contain the integer value or the character value ... One simplifies input/output; ...
    (comp.lang.c)
  • Re: Output to Excel
    ... a percentage (using xlswrite), is this possible using ... multiplying the value by 100 and truncating at some particular ... the character '%' to show up after them and some numbers to -not- ...
    (comp.soft-sys.matlab)
  • Re: strcpy
    ... >> mbcs), ... >> not using plain chars. ...
    (comp.lang.cpp)
  • Re: Easy formatting questions :-)
    ... 15 columns and a binary string of unknown/variable length. ... The format of the input file 'test.txt': ... declare a huge character string, bigger than you will ever need ... that will write out the title and first 100 digits one the first ...
    (comp.lang.fortran)
  • Re: Arithmetic - Multiply & Divide vs * & /
    ... > exceeded, which is apparently 15 digits in your case. ... > |Decor Decto match the field it's multiplying. ... > |Any should I always be using Multiply and Divide fctns, ... > of digits after the point, it's generally necessary to use DIVIDE. ...
    (comp.lang.pl1)