Re: || putchar(ch == '\177' ? '?' : ch | 0100) == EOF)



"c gordon liddy" <c@xxxxxxxxxxxxx> writes:
2 different cats.

I've been going through chp 8 of K&R and wanted to write a standard cat
function with a little more functionality than existing solns: I want to
code behavior for the -v switch. It occurs to me that there "should" be
source out there for this and googled for "cat.c unix source" . The second
hit I got was this:

/*
* Concatenate files.
*/

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>

char stdbuf[BUFSIZ];

main(argc, argv)
char **argv;
{
int fflg = 0;
register FILE *fi;
register c;
int dev, ino = -1;
struct stat statb;

setbuf(stdout, stdbuf);
for( ; argc>1 && argv[1][0]=='-'; argc--,argv++) {
switch(argv[1][1]) {

Holy smokes! This must count as archeology for unix systems. Funky-looking
main call and register as a type as opposed to storage specifier. This gave
me a pretty good idea what I wasn't looking for.

Yes, that code is archaic, but it's of some historical interest (to
show how much the language has improved if nothing else).

It makes heavy use of "implicit int", which is discouraged for C90
(though the standard doesn't say so) and dropped completely in C90.
In "register c;", register is a storage specifier, just as it is in
modern C; the declaration is equivalent to "register int c;".

I then hit on:
http://www.openbsd.org/cgi-bin/cvsweb/src/bin/cat/cat.c?rev=1.14&content-type=text/plain

The first thing a person notices is the stack of non-standard headers.
Their inclusion is the usual reason for lack of topicality of unix
questions. My platform and my target consist of my non-unix machine; not
only do I not know what's in those headers, I don't have them.

Most of the non-standard stuff is probably not strictly necessary, but
it can be used to improve performance or to provide various bells and
whistles.

Past that is the main control:
while ((ch = getopt(argc, argv, "benstuv")) != -1)
[...]

getopt is non-standard, as I'm sure you know.

[...]
The only case I'm to consider is 'v', so I won't need all of this. getopt
will be something that I have to code from scratch. Out of curiosity, what
header is it defined in?

It varies. Consult your system's documentation, or Google it. If
your system doesn't provide it, there are open-source implementations
out there. (For that matter, there are plenty of open-source
implementations of "cat", but I suppose that would defeat your
purpose.)

Moving along is:
if (bflag || eflag || nflag || sflag || tflag || vflag)
cook_args(argv);
, so if any flag gets set we cook the args. Maybe instead, we cook with the
args. In this process we traverse through:

} else if (vflag)
{ if (!isascii(ch))
{ if (putchar('M') == EOF || putchar('-') == EOF)
break;
ch = toascii(ch);
}
if (iscntrl(ch)) {
if (putchar('^') == EOF ||
putchar(ch == '\177' ? '?' :
ch | 0100) == EOF)
break;
continue;
}
I did my best to get this on the screen. The parts I don't understand here
follow the double pipe, which I read as "inclusive or." In the first if
clause, it would appear that 'M' is substituted for non-ascii chars. What
does
|| putchar('-') == EOF)
do beyond this?

It uses a common convention (at least it's common on Unix) for
displaying non-printable characters. Control characters in the range
0 to 31 are represented as a '^' followed by another character,
usually an uppercase letter; it's determined by adding 64 to the
value. (On old keyboards, the control key actually worked by clearing
a bit in the 7-bit or 8-bit value that was transmitted.) The DEL
character, 127, is represented as ^?; this is a special case.
Characters with the high bit set, in the range 128 to 255, are called
"meta" characters (some old keyboards had a "meta" key that set this
bit), and are represented as "M-" followed by the representation of
the corresponding 7-bit character. For example, character 129 would
be printed as M-^A.

putchar() returns EOF on failure.

All this (except the EOF part) is very specific to the ASCII character
set, something that's not specified by the C standard, but it should
give you enough information to understand what the code is doing (with
a bit of work).

Similarly, I'm out of my depth with what follows the double pipe in the
second if clause.
|| putchar(ch == '\177' ? '?' : ch | 0100) == EOF)

Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
that it's beyond the scope of my present endeavor and can be omitted.

No, it's not a trigraph; trigraphs are introduced by a double question
mark. It's a character constant that uses an escape sequence. '\177'
is the character whose integer value is 177 in octal, or 127 in
decimal; it's the ASCII DEL character. "ch | 0100" yields the value
of ch with a certain bit forced on; it's terse way of mapping
control-A (1) to 'A" and so forth. The conditional expression is used
to handle the fact that mapping DEL to "^?" is a special case.

--
Keith Thompson (The_Other_Keith) <kst-u@xxxxxxx>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
.



Relevant Pages

  • Transponder Protocol Open Standard rec.pets-2005a
    ... But anybody can write an open standard, ... transponders readable by the standard is not intended to be identical ... as 20 character readings under this standard unless the reader designer ... If an excerpt string shows four or fewer transitions ...
    (rec.pets)
  • Re: Herbert Schildt, author of The Complete C++ Reference (NOT C Unleashed) rehabilitated on wikiped
    ... well as on Amazon and wikipedia, on Schildt, Heathfield et al. did try ... Annotated Annotated C Standard" at http://www.lysator.liu.se/c/schildt.html, ... auditing EACH AND EVERY LINE for character width dependent operations ... standard (something that is not mentioned in the annotations), ...
    (comp.programming)
  • Re: All programs are undefined
    ... The standard doesn't specify whether putchar ... returns 'X' or EOF. ... terminating newline as the last character of a text stream. ...
    (comp.lang.c)
  • Re: Segfault City
    ... Those who ignore the standard library condemn themselves to rewriting it. ... programmers have to use a dictated and not fully standard compiler. ... I'm saying that to rely on character ordering that is not guaranteed ...
    (comp.lang.c)
  • Re: Trigraphs forever
    ... Visual C versions 6.0 and later emit "warning C4010: single-line comment ... "Trigraphs are not popular and many compilers implement them incorrectly. ... Discussing the current C standard is not a waste of time. ... "2 Except within a character constant, a string literal, or a comment, the ...
    (comp.std.c)