Re: Fortran decimal anyone?

From: John H. Lindsay (jlin_DELETE_THIS_SPAM_ZOT_dsay_at_kingston.net)
Date: 05/13/04


Date: Wed, 12 May 2004 20:19:10 -0400

Hi Fortranners:

While looking at the problem of supporting packed and unpacked
decimal data strings (and at the same time the problems in doing
arbitrary precision integer and fixed point decimal and fixed
point binary arithmetic) in SNOBOL4 under OS/2, I made a list of
the internal representations of fixed point decimal data that I
had seen on Intel hardware, some immediate extensions thereof and
alternates thereto. Some of them were inherited from (I.B.M.,
Burroughs or Honeywell) (mainframe hardware, PL/Is or COBOLs). I
think the list gives an understanding of the potentially huge
size of the 'simple' problem of fixed point decimal data in
FORTRAN. Plainly, since such data exists in files which
Fortranners want to, even need to, process, a means to access,
use, and produce such data in files is needed. BTW, I'm speaking
of data layouts where the data is _not_ 'DISPLAY' (COBOL term,
one digit &c. per byte, for printing or for other _visual_
reading), but rather 'COMPUTATIONAL', for calculation either by
hardware or by carefully optimized in-line generated code or
library software (whether visually readable or not).

All the decimal data field formats I have seen were of fixed
length and of fixed (assumed) position within their data records;
no terminating or attached/embedded field length controlling
characters or subfields were seen. Similarly, while I have seen
cases where there were decimal point indicators (".", or "," in
some countries) existed in fields I would want to call
COMPUTATIONAL, in such cases the decimal points were always in a
fixed position. Where the decimal point indicator could float in
the field, because of the size of the code required to handle the
data, I'd want to call it DISPLAY. I also regard 'BLANK WHEN
ZERO' fields as DISPLAY.

To repeat a bit, I'm dealing here with the _internal_ machine
format of the data as might be seen in a hexadecimal core dump,
and handled in assembler or a very machine-oriented language.

One Decimal Digit per Byte (Unpacked) Data Formats.
--------------------------------------------------

The digits may be arranged within a data field as big-endian or
small-endian (2 options).

The digit characters seen were either ASCII or EBCDIC (printable)
characters or 1-byte binary representations of 0 through 9 - i.e.
0x00 or 00h through 0x09 or 09h as in assembler source (3
options).

In no cases seen, was the possibility allowed for that 2 or more
sign characters or indications could be in any one field.

Some data formats allowed leading blanks for 0's, and some did
not. Some allowed trailing blanks and some did not (4 options,
but see below where leading and/or trailing blanks are accounted
for as they may occur with each of the other possibilities).

I'm treating the cases where a trailing DB, _DB , CR or _CR,
(where the '_' represents a blank character) is allowed as a
negative sign as 'DISPLAY', even though some COBOLs and PL/I
allowed doing arithmetic with such things; I haven't seen any
hardware that did that directly. Similarly, I'm treating fields
with comma, decimal point and embedded blank characters (as
thousands, millions, billions, ... separators, but not as an
actual decimal point indicator), whether leading, trailing or
embedded, as DISPLAY, not as COMPUTATIONAL.

No cases of hardware support of biased decimal data were seen,
nor were cases of hardware support of range limits for decimal
data (other than simple digit and sign capacity of the field).

The sign characters seen were either
     (1) Never present (data always assumed positive in cases
         seen; with leading/trailing blanks 4 options).
     (2) Always present (even if positive).
     (3) Optional (absent => positive was the only usage seen in
         this case).

In some cases, the sign indication, if present, was a separate
character -- a minus sign ('-') or a plus sign ('+') in the
system standard character set were the only ones seen. In
others, the sign indication, if present, was combined in a byte
with a digit indication -- an 'overpunched digit' character.
Typically this was A to J for +0 to +9, and K to T for -0 to -9
(EBCDIC). The concept of 'overpunched blank' was not seen.

In the case of (2) or (3) above, the sign code was either
     (a) A leading separate character as the first character of
         the field (blanks may or may not occur between the sign
         and digits, and may or may not after them - 4 options).
     (b) A leading separate character as in (a), but following
         any leading blanks and preceeding any other digits (same
         4 options).
     (c) A trailing separate character (same 4 options).
     (d) An embedded separate character in a fixed position (cf.
         PL/I picture format character J ; same 4 options).
     (e) A floating separate character, but always following any
         leading blanks and preceeding any trailing blanks (same
         4 options).
     (f) A leading 'overpunched digit' character as the first
         character of the field (leading blanks not possible in
         this case; with trailing blanks, 2 options).
     (g) A leading 'overpunched digit' character (following any
         leading blanks; the 4 options).
     (h) A trailing 'overpunched digit' character (the 4 options).
     (i) An embedded 'overpunched digit' character in a fixed
         position (the 4 options).
     (j) An 'overpunched digit' where the sign could float to any
         digit in the field (the 4 options).
     --- (38 options)

No case was seen where a separate sign character could float
among leading or trailing blanks, and no case was seen where a
separate sign character occurred in a 'fixed' position other than
as the first character of the field or immediately before the
digit characters.

For the unpacked data forms, this gives 2 x 3 x (4 + 38 + 38) =
480 cases.

Two Decimal Digits per Byte (Packed) Data Formats.
-------------------------------------------------

In the cases I've seen, a decimal digit was represented by a hex
digit, and as there are 16 hex digits (0 through F), the other 6
hex digits were used somehow as a sign. I haven't seen any use
of the other 6 hex digits (A through F) as anything other than a
sign.

The digits may be either big endian or little endian within a
byte (2 options).

The bytes may be either big endian or little endian within a
field (2 options).

A sign may be
     (i) Never present (data always assumed positive in cases
         seen)
     (ii) Always present (even if positive).
     (iii) Optional (absent => positive was the only usage seen in
         this case).

In the case of (ii) and (iii), the sign could be
     (I) Leading.
     (II) Trailing.
     (III) Embedded at a fixed location in the field.
     (IV) Floating within the field.

This gives a maximum of 2 x 2 x 3 x (1 + 4 + 4) = 108 cases.

Plainly, using this whole scheme of possibilities is not
reasonable, and any one machine or implementation of a language
on a particular piece of hardware that I've seen uses only a
small subset of the possibilities. Choosing a subset suitable
for a particular machine or implementation of a language is no
simple job if one tries to be compatible with a large number of
the data forms existing in files.

Even the attempt to simplify the problem by allowing the
conversion of the above forms to and from a common form for doing
arithmentic, and doing the arithmetic in that form, is big enough
(and probably slower than we Fortranners would like to call
reasonable).

John.

-- 
John H. Lindsay  jlin_DELETE_THIS_SPAM_ZOT_dsay@kingston.net
48 Fairway Hill Crescent, Kingston, Ontario, Canada, K7M 2B4.


Relevant Pages

  • RE: Excel column width question again
    ... do not need to determine what font and point size to use. ... Truncate(128/{Maximum Digit Width}))/256)*)' to ... 100+0.5)/100 to convert pixel to character number. ... Microsoft Online Community Support ...
    (microsoft.public.office.developer.com.add_ins)
  • Re: embarrassing spaghetti code needs stylistic advice
    ... assumes that there is a character in CH which ... necessary using RCH, and it calls RCH at least once so that, when it ... You seem very worried about reading in an EOF character. ... Either '+' is followed by a digit, ...
    (comp.lang.c)
  • Re: Mod 43 Check Digit calculator
    ... miscalculation of the check digit. ... > response to you) should return the check character itself rather than> the ... Here is that formula with the absolute reference problem> that ... > space in the encoding string at the 39th position. ...
    (microsoft.public.excel.programming)
  • Re: Need Help.
    ... '0') from a character to change it to a number, do your calculation, and add 30h to the number to make it a character before displaying it. ... If you want to accept multiple digit numbers as input... ... Once you can display a correct answer all the way up to 9 + 9, I think you'll find it easy to extend it to multiplication. ...
    (comp.lang.asm.x86)
  • Re: Question About Percy
    ... he is completely honest and 100% sincere. ... Early on Percy bought that deeply boring book "Prefects Who Gained ... any indication that Percy is anything other than an ambitious, ... Look at the big character surprises up to now. ...
    (alt.fan.harry-potter)

Loading