Re: PL/I string representations

From: Edward G. Nilges (spinoza1111_at_yahoo.com)
Date: 01/01/04


Date: 31 Dec 2003 16:33:00 -0800

robin <robin_v@bigpond.mapson.com> wrote in message news:<XiBIb.71692$aT.42968@news-server.bigpond.net.au>...
> From: Randy Howard <randy.howard@FOOmegapathdslBAR.net>, Imperious Overlords Anonymous
> Date: Mon, 29 Dec 2003 05:15:29 -0600
>
> > Well, this may not be a definitive source, but it does have several
> > chapters devoted to PL/I. The book is "Introduction to Programming
> > Languages", by W. Wesley Peterson, Prentice-Hall, 1974.
> >
> > I apologize in advance to the expert PL/I programmers if this is all
> > redundant, there have been several here who claim little or no knowledge
> > of the language, so it was interesting to me, hopefully it will be to
> > those individuals as well. If you care not, now would be a good time to
> > bail.
> >
> > There are multiple chapters in the text on PL/I, but since we are
> > concerned with EGN's claim that PL/1 originally used a byte to store
> > string length, thus limiting strings to 255 characters, I focused on
> > Chapter 8, dedicated to string processing with PL/I.
> .
> There is nothing in the PL/I language that limits the length of a
> string to any particular value.
> .
> Nor, for that matter, is there anything to limit the size of
> an array.
> .
> The actual limits are the prerogative of the manufacturer.
> .
> > The book also covers BASIC, FORTRAN, ALGOL, APL, COBOL, SNOBOL, LISP,
> > as well as appendices on "Revised Report on the Algorithmic Language
> > ALGOL60", "Computer Programming Problems", "Typical Computer Runs",
> > "Implementation of Recursive Procedures" and "Language Processor
> > Availability".
> >
> > In the latter, Appendix F, page 351, he writes "PL/I -- all programs
> > except the last program in Chapter 4 were run on the IBM PL/I Optimizing
> > Compiler. The last program in Chapter 4, and many others, were run with
> > the IBM F-level compiler, and except for minor restrictions on the use
> > of pointers and the requirement in some cases for ENTRY declarations, I
> > believe all will run with the F-level compiler."
> >
> > I also note that he used IBM compilers for the FORTRAN, ALGOL60, APL,
> > and COBOL sections, along with Bell Labs SNOBOL4, Illinois Institute
> > of Technology SPITBOL, Waterloo LISP and "UHBASIC, information available
> > from the author". The reason I mention this is that the author makes
> > the claim in the first paragraph of the first chapter on PL/I that
> >
> > "It appears now that PL/I will become the most important
> > programming language in the next decade."
> >
> > The heavy dependency on IBM compilers may simply imply that IBM was
> > marketing PL/I heavily as "the next big thing at the time", however an
> > jargon file reference describes the famous dmr quote "you know where
> > to find it" was based upon defending C against PL/I.
> >
> > The author probably was not aware of C (or have any idea of its future
> > wide use) at this time as it was just getting started during the period
> > of time in which this book was being written. Similarly BCPL and B are
> > not mentioned at all.
> >
> > In October of 1963, a committee formed within SHARE, the organization
> > of users of large IBM computers, to specify a program with the
> > goals of satisfying scientific, commercial, real-time and systems
> > programmers. Also NPL (New Programming Language) was an early name
> > for PL/I. The original expectation was that the new language would
> > be an extension of FORTRAN, but the committee took a broader view.
> .
> You can't make a silk purse out of a sow's ear.
> .
> > The first IBM manual on PL/I appeared in 1965, and the result was
> > a language bearing a close resemblance to ALGOL,
> .
> The best parts of Algol, FORTRAN, and COBOL, and were taken
> and improved on to become the features of PL/I.
> In respect of Algol, the only resemblances were in the block
> structure and in the existence of recursion.
> I/O was modelled on COBOL and FORTRAN.
> Interrupt handling was new.
> .
> > but much broader
> > in scope and more closely oriented to real computers (particularly
> > the IBM 360/370) than ALGOL.
> .
> Algol gained popularity as a publication language and as a programming
> language, but its main lack was the initial absence
> of any standard I/O (which was generally clumsily implemented)
> and to the problems that arose on hardware.
> .
> > A contemporary of the author, a "Professor Saul Rosen" (university
> > unspecified) flamed the IBM/SHARE community for creating a new language
> > as a "crash project" completed by only six men, "some of which had other
> > responsibilities at the same time" (THE HORROR!!) which was believed to
> > be important to the entire computing community at the time. A source for
> > this "flame" was a review in "Computing Reviews VI No. 2 (1965).
> > Apparently politics and complaining about language standards committees
> > is not a new thing after all. :-)
> >
> > The author of this book goes on to say that the result of this
> > "rush job" was a lot of special cases, exceptions, and inattention
> > to the little details that made the language difficult to cleanup,
> .
> which langage? FORTRAN?
> .
> > except by defining a new language.
> .
> The only way forward was to define a new language.
> It took 25+ more years to evolve FORTRAN to the stage
> where it resembled a modern programming language (Fortran 90),
> but it still lacks basic features that were available in
> PL/I in 1966.

This is true. However, in 1966 there were workable compilers for
Fortran. I found a workable compiler for Fortran in 1971 (with a bug
added by a customer engineer: IEEE cf. Transactions in the History of
Software, Spring/Summer 1999).

It compiled programs of reasonable size into 8000 six bit bytes of
memory, using a distant precursor of Java byte codes.

In 1966 no such technology existed for PL/I. The F compiler was
unusable, in my direct experience, by those Chicago corporations who
standardized around PL/I before 1973, when the Optimizer and Debugger
compilers became available, and several dp managers were canned as a
result.

PL/I was vaporware until 1974. However, IBM did a terrific job between
about 1966 and about 1973 on the Optimizer and the Debugger compilers.
Its conduct in fact resembles Microsoft's in more recent years, where
it released a farce and then followed up with an impressive system.
This prefigured the period between Windows 1.0 and 3.1.

> .
> > He goes on to say that this
> > caused manufacturers of other computers (than IBM) to be very
> > reluctant to produce PL/I compilers. If you're curious how
> > the "most important language of the decade" could have this
> > flaw,
> what is that?
> > we think similarly. However, he explains this away as the
> > growing popularity of PL/I amongst the IBM community forced
> > other manufacturers to produce compilers to remain competitive.
> >
> > String declarations in PL/I look like the following:
> >
> > DECLARE A CHARACTER(5) INITIAL ('ALICE');
> > DECLARE B CHARACTER(5);
> > DECLARE C CHARACTER(10) INITIAL ('YAMAMOTO');
> >
> > The author points out that they always have the exact length
> > specified, so that the string representation of "C" will
> > actually be 'YAMAMOTO ' with two blanks inserted to make
> > it's length ten.
> >
> > Additionally, variant length strings can be specified with
> > DECLARE E CHARACTER(20) VARYING;
> >
> > The specified length is the minimum, and each time a character
> ~~~~~~~~
> Actually, the specified length is the maximum.
> .
> > string is assigned to E, the length is stored with it. There
> > is a built in function LENGTH that gives the length of
> > character strings.
> >
> > Concatenation is done as follows:
> >
> > E = A||C;
> >
> > results in the value 'ALICE YAMAMOTO '. He doesn't explain the
> > space between the two, it may be automatically inserted, or a
> > typo. I do not know for sure.
> .
> An error. There would be no space in this instance.

In my publication experience, computing authors MUST proofread their
manuscripts very, very carefully, because well-intentioned and highly
intelligent editors without programming backgrounds will often insert
spaces between what seem to be words, and capitalize names when they
need (in the case of the C language) to be completely case-sensitive.

> .
> > There is also a "beautiful" TRANSLATE(S,T,U) function. T and U must
> > be character strings of identical length, every occurrence in S of
> > the first character in U is replaced by the first character in T,
> > etc. For example:
> >
> > TRANSLATE(S,'GAR','BEL')
> >
> > will replace every 'B' in S by 'G', every 'E' by 'A' and every
> > 'L' by 'R'.
> >
> > If the third argument is omitted entirely, then all possible bytes
> > (256) in the string in order of their binary code is assumed to
> > be present. A sort of "auto template" to make wholesale
> > translations easier. Keep in mind this is all based on the 256
> > character EBCDIC character set.
> .
> It is based on whatever character set is available.
> The PL/I language never specified what internal code should be used.
> At that time (1966), almost every manufacturer had a different
> internal representation for characters - a situation that
> persisted into the 1980s.

Mostly true, but EBCDIC was pretty much dominant from 1964 to about
1980.

> .
> > I also found out that it was actually possible to use language
> > keywords as variable names and the parser would figure out what
> > you meant from context. EGN was claiming that C has design flaws?
> > Yeah, right.
> .
> Having keywords that are not reserved words means that should
> the language be extended at some future time, it will
> not prevent old programs from compiling successfully.
> .
> It also means that it is unnecessary to commit to memory
> all of the reserved words. It would otherwise be possible
> to inadvertently use an obscure keyword as a variable.
> Programmers don't go out of their way to use keywords as
> identifiers.

Using Hungarian notation avoids this problem.

> .
> > If you are wondering where the punch line is with respect to the
> > maximum string length or the internal representation of a
> > string, unfortunately the author did not mention it at all. Either
> > it was an omission, or it was not a problem. To be fair, this
> > book was written almost ten years after the language came about,
> > so the problem might have been gone by this time.
> .
> The maximum length was sufficiently high so as not to be a problem -
> 32767 characters for the IBM F compiler, c. 1966.

Are you certain that 32767 was the limit? It may be, but we know the
compiler reserved two bytes. Either completely invalid strings could
exist with a negative length...or else the limit was 64K.

In my PL/I praxis in the 1970s, the 32767 (or 64K) limit was indeed a
problem, because around 1972, IBM mainframes became available with
larger storage sizes as a result of virtual storage.

Therefore in 1976 I developed a set of tools to represent superstrings
as arrays of PL/I strings, and I rescusitated this technology between
1991 and 1995 before the introduction of Visual Basic 4. Visual Basic
3 limited strings to 64K.

I used the software to build a network analysis program for Illinois
Bell in 1980 that used printer graphics to show patterns.

> .
> IBM's "PL/I Reference Manual" (2nd edition, March 1968 C28-8201-1) p. 34
> specifies a limit of 32767 characters.

It was a common experience in the early days for tech writers to write
what they thought were sensible values, and for a product to be
released with "undocumented cool features" as a result: my son, for
example, discovered a whole new set of features in the Texas
Instruments Speak and Spell at the age of four.

But I cannot remember the exact limit.

The IBM 1401 was released with a three digit address, and a "zone bit"
for configurations above 1K of memory. The addition of the Modify
Address opcode meant that in fact that the IBM 1401 had an entire
system for super-packed math which was arguably cool and an example of
the fun that was to be had back then.

> As the limit of 32767 was not flagged as a change from the first
> edition, it can be taken that the limit of 32767 was in
> existence in the first edition.
> Furthermore, the 2nd edition applied to the 4th version of the
> PL/I F compiler, and therefore the limit of 32767 characters applied
> at least to the 3rd version of the F compiler.
> .
> > Attempts at finding a reference to this string limitation in
> > early PL/I compilers via google was unsuccessful, however many
> > of the links which came up where dead and gone. IBM still
> > offers compilers for their big iron though, but I did not find
> > any reference to this on their pages either.
> >
> > In the course of other investigation, I discovered that PL/I was
> > developed at IBM's Hursley Park research facility in England.
> .
> That was IBM's PLI F compiler. The D compiler was developed
> elsewhere, possibly Germany.
> .
> Other manufacturers developed their compilers independently.

David Cutler, now with Microsoft, was the architect of Windows NT in
the 1990s. In an earlier life he was a compiler developer for Digital
Equipment Corporation, who wrote a book, published in the 1980s by DEC
press, on the development of the DEC PL/I compiler.
> .
> > Maybe Richard has a better chance to bulldog this one. :-)
> >
> > Some IBM docs on a windows PL/I compiler are here:
> > http://www-306.ibm.com/software/awdtools/pli/pliwin/library/
> >
> > I did find this at a link off the above:
> > "The storage allocated for VARYING strings is 2 bytes longer than the
> > declared length. The leftmost 2 bytes hold the string's current length"
> .
> That gives a maximum of 32767 characters for current IBM PL/I for
> Windows and other workstation products and for the current
> IBM Enterprise PL/I for the mainframe machines.

No, the real maximum is 65535 as long as you have any kind of unsigned
capability. If memory serves from my adventures in BAL (the IBM 360
assembler), you had unsigned ops. You certainly have 'em on Windows.

I suppose you could find a meaning for "a string with negative
length". Perhaps a "string with negative length" is a prompt for
input.

Seriously, folks, I think the limit is 64K. If you have two bytes to
play with you have 16 bits. 2^16 is 65536.

> .
> > There is also apparently a VARYINGZ in the language now that uses
> > C-style \x00 terminated strings to avoid arbitrary limits.

Talk about pandering...
> .
> The C-style does not avoid arbitrary limits.
> Such a string cannot contain the character 00x as
> this would terminate it prematurely.

I agree. But C programmers sometimes have trouble with this.

> .
> > I found no reference to this in the earlier book.
> >
> > --
> > Randy Howard



Relevant Pages

  • Re: A note on computing thugs and coding bums
    ... It would handle international characters if the execution character ... method I used in "Build Your Own .Net Language and Compiler". ... work areas and counting on Nul is an illusion. ...
    (comp.programming)
  • Why C Is Not My Favourite Programming Language
    ... C has no string type. ... compiler take care of the rest. ... Why does any normal language ... the programmer fail. ...
    (comp.lang.c)
  • Re: A note on computing thugs and coding bums
    ... to use another definition of the word "string", ... that C fails to allow programmers to handle some other data format, ... Another is to encapsulate the character ... The C language supports the former mechanism for dealing with arbitrary ...
    (comp.programming)
  • Re: Which is better - a char type or a string of length one?
    ... character long) we could, perhaps, get the first element of each as ... different from a fixed length string? ... That makes your language untyped (or ... means an array of elements, an ordered set. ...
    (comp.lang.misc)
  • Re: Code Review - is this code shit
    ... a matter of language only. ... Richard Heathfield constantly abuses and harasses unsuspecting users ... mainframe (I had had to debug the compiler in object code form). ... "proposed a real string data type" they would have invented Java or C ...
    (comp.lang.c)