Re: || putchar(ch == '\177' ? '?' : ch | 0100) == EOF)



On Mar 28, 8:11 am, Keith Thompson <ks...@xxxxxxx> wrote:
c gordon liddy <grumpy196...@xxxxxxxxxxx> writes:



"Keith Thompson" <ks...@xxxxxxx> wrote in message news:
87hcerqseb....@xxxxxxxxxxxxxxxxxx
"c gordon liddy" <c...@xxxxxxxxxxxxx> writes:
[...]
Similarly, I'm out of my depth with what follows the double pipe in the
second if clause.
|| putchar(ch == '\177' ? '?' : ch | 0100) == EOF)

Wouldn't \177 be a tri-graph? A perfectly-acceptable explanation might be
that it's beyond the scope of my present endeavor and can be omitted.

No, it's not a trigraph; trigraphs are introduced by a double question
mark. It's a character constant that uses an escape sequence. '\177'
is the character whose integer value is 177 in octal, or 127 in
decimal; it's the ASCII DEL character. "ch | 0100" yields the value
of ch with a certain bit forced on; it's terse way of mapping
control-A (1) to 'A" and so forth. The conditional expression is used
to handle the fact that mapping DEL to "^?" is a special case.

I think I could study the above for a long time and not really get
it. It's interesting but not germane to something that can be done in
standard C. I have a double problem with the double pipe here. Not
only is that which is on the right hand side of it obfuscated C, I
don't get the control mechanism. To me, it looks like
if this then that or the other.

The quote code, as far as I can tell, *is* standard C. I don't
believe it's deliberately obfuscated; rather, it's unusually terse,
written in a style that favors packing lots of information into
complex expressions rather than breaking it down into separate
statements.

You can skip it and go on to something easier if you like, but you
might consider taking one more stab at it.

Let's take a look at the statement:

if (iscntrl(ch)) {
if (putchar('^') == EOF ||
putchar(ch == '\177' ? '?' :
ch | 0100) == EOF)
break;
continue;
}

if ch is a control character then
if printing '^' fails *or* printing another character fails then
break out of the loop (give up)
end if
Printing succeeded; nothing more to do here: "continue"
end if

iscntrl(ch) returns true if ch is a "control character". In this
context, it tells us that it's a non-printable character that we want
to represent as a '^' followed by another character (^G for the ASCII
BEL character, ^? for DEL).

Within the if statement we see two calls to putchar(), one to print
the '^' character and one to print whatever follows it. Both results
are compared against EOF (which indicates failure); if either
putchar() fails, we break out of the loop.

The part before the "||" is reasonably clear: try to print a '^'
character and check whether the attempt failed. "||" is a
short-circuit operator, evaluating its right operand only if the left
operand is false, so if the first putchar call fails we won't attempt
the second one.

Now let's look at the part after the "||":

putchar(ch == '\177' ? '?' : ch | 0100) == EOF

We've covered the higher level control flow, so we're down to figuring
out what the heck

ch == '\177' ? '?' : ch | 0100

means. Some parentheses might make it clearer:

(ch == '\177') ? ('?') : (ch | 0100)

If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
DEL), the expression yields '?'. The result is that we print a '?'
after the '^'.

Otherwise (For any other control character), the result is (ch |
0100). 0100, since it begins with '0' is an octal constant, equal to
64, a power of 2. "|" is the bitwise "or" operator.

The binary value of 0100 is 01000000. Suppose the value of ch is 7
(ascii BEL, which we're going to want to print as "^G"). 7 is
00000111. Applying bitwise or to these two operands gives us
01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.

0100 (octal) is being used as a bit mask; it has a single bit set to
1, and all others set to 0. (ch | 0100) yields the value of ch with
the bit in that particular position turned on. As it happens, that's
a terse way to specify a transformation from a control character to
the corresponding letter.

Note that (ch + 64) would have worked just as well in this context
(since we know the bit we want to turn on isn't already on). The
author probably chose to write "ch | 0100" because he thought of the
operation as setting a bit, not as the equivalent addition.

Here's a much more verbose chunk of code that does the same thing.
I've kept the "c | 0100" idiom, but expanded everything else. The
original code is more terse than I tend to like; the following is much
too verbose for my taste, but it might be clearer. (I've compiled it,
but I haven't tested it.)

if (iscntrl(ch)) {
/* ch is a control character */
int result;

/*
* The two characters we want to print. The first is '^';
* we don't know yet what the second is.
*/
int ch1 = '^';
int ch2;

/* Try to print the first character. */
result = putchar(ch1);
if (result == EOF) {
/* Failed, terminate the loop *?
break;
}

if (ch == '\177') {
/* ch is DEL, we want "^?" */
ch2 = '?';
}
else {
/*
* ch is another control character.
* Transform 1 to 'A', 2 to 'B', etc. using
* our intimate knowledge of ASCII encoding.
*/
ch2 = ch | 0100;
}

/* Print as above */
result = putchar(ch2);
if (result == EOF) {
break;
}
}

--
Keith Thompson (The_Other_Keith) <ks...@xxxxxxx>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

K&R 7.5 has the text that includes the cat function that is alluded to
in 8.1. The filecopy there uses characters instead of buffers to do
its business. I believe it is better suited to my current task than
using buffers. The part needing revision to account for the -v
behavior appears as an external, void function. Main makes the
adjustment for the output to go to stdout.

/*filecopy */
void filecopy(FILE *ifp, FILE *ofp)
{
int c;

while((c=getc(ifp)) != EOF)
putc(c, ofp);
}

I don't know whether I'll be able to get the job done with one int, so
I'll put c in reserve and use ch to match the source I snipped from
the bsd site. I've further added symbols to match Keith's verbose
version.

/*filecopy */
void filecopy(FILE *ifp, FILE *ofp)
{
int c;
int ch;
int result;

while((ch=getc(ifp)) != EOF)
putc(ch, ofp);
}

So, I've got to exchange this for the putc statement:

if (iscntrl(ch)) {
/* ch is a control character */
int result;

/*
* The two characters we want to print. The first is '^';
* we don't know yet what the second is.
*/
int ch1 = '^';
int ch2;

/* Try to print the first character. */
result = putchar(ch1);
if (result == EOF) {
/* Failed, terminate the loop *?
break;
}

if (ch == '\177') {
/* ch is DEL, we want "^?" */
ch2 = '?';
}
else {
/*
* ch is another control character.
* Transform 1 to 'A', 2 to 'B', etc. using
* our intimate knowledge of ASCII encoding.
*/
ch2 = ch | 0100;
}

/* Print as above */
result = putchar(ch2);
if (result == EOF) {
break;
}
}


So I think I'm ready to take this to a compiler. I'm on someone
else's laptop. It probably does have a compiler, but its owner is in
an online naval battle. Our girlfriends are at the theatre. I love
theatre when I don't have to go.

Because I have to make the keystrokes, I'll finish with the caller.
No non-standard headers here:
#include <stdio.h>

int main(int argc, char **argv)
{

FILE *fp;
void filecopy(FILE *, FILE *);

if (argc < 2) printf("die");
else
while (--argc > 0)
if ((fp = fopen(*++argv, "r")) == NULL)
{
printf("catv can't open %s\n", *argv);
return 1;
}
else
{
filecopy(fp, stdout);
fclose(fp);
}

return 0;
}
Since the google portal is the only way for me to get this back to my
own machine, I include reference material after the sig.

--
c gordon liddy



if (iscntrl(ch)) {
if (putchar('^') == EOF ||
putchar(ch == '\177' ? '?' :
ch | 0100) == EOF)
break;
continue;
}

if ch is a control character then
if printing '^' fails *or* printing another character fails then
break out of the loop (give up)
end if
Printing succeeded; nothing more to do here: "continue"
end if

iscntrl(ch) returns true if ch is a "control character". In this
context, it tells us that it's a non-printable character that we want
to represent as a '^' followed by another character (^G for the ASCII
BEL character, ^? for DEL).

Within the if statement we see two calls to putchar(), one to print
the '^' character and one to print whatever follows it. Both results
are compared against EOF (which indicates failure); if either
putchar() fails, we break out of the loop.

The part before the "||" is reasonably clear: try to print a '^'
character and check whether the attempt failed. "||" is a
short-circuit operator, evaluating its right operand only if the left
operand is false, so if the first putchar call fails we won't attempt
the second one.

Now let's look at the part after the "||":

putchar(ch == '\177' ? '?' : ch | 0100) == EOF

We've covered the higher level control flow, so we're down to figuring
out what the heck

ch == '\177' ? '?' : ch | 0100

means. Some parentheses might make it clearer:

(ch == '\177') ? ('?') : (ch | 0100)

If ch is equal to '\177' (character 177 octal, 127 decimal, ASCII
DEL), the expression yields '?'. The result is that we print a '?'
after the '^'.

Otherwise (For any other control character), the result is (ch |
0100). 0100, since it begins with '0' is an octal constant, equal to
64, a power of 2. "|" is the bitwise "or" operator.

The binary value of 0100 is 01000000. Suppose the value of ch is 7
(ascii BEL, which we're going to want to print as "^G"). 7 is
00000111. Applying bitwise or to these two operands gives us
01000111, which is 0107 in octal, or 71 in decimal, or 'G' in ASCII.

0100 (octal) is being used as a bit mask; it has a single bit set to
1, and all others set to 0. (ch | 0100) yields the value of ch with
the bit in that particular position turned on. As it happens, that's
a terse way to specify a transformation from a control character to
the corresponding letter.

Note that (ch + 64) would have worked just as well in this context
(since we know the bit we want to turn on isn't already on). The
author probably chose to write "ch | 0100" because he thought of the
operation as setting a bit, not as the equivalent addition.

Here's a much more verbose chunk of code that does the same thing.
I've kept the "c | 0100" idiom, but expanded everything else. The
original code is more terse than I tend to like; the following is much
too verbose for my taste, but it might be clearer. (I've compiled it,
but I haven't tested it.)
.



Relevant Pages

  • Re: String to integer
    ... detect the character that terminated a numerical field. ... Skipblks returns the char that getc will next return, or EOF. ... int skipwhite; ...
    (comp.lang.c)
  • Re: getline - sort of
    ... detect the character that terminated a numerical field. ... static int ignoreblks ... which may be \n or EOF ... Skipblks returns the char that getc will next return, ...
    (comp.lang.c)
  • Re: Building a simple command line interface
    ... detect the character that terminated a numerical field. ... static int ignoreblks ... which may be \n or EOF ... Skipblks returns the char that getc will next return, ...
    (comp.arch.embedded)
  • Re: User Input issue
    ... detect the character that terminated a numerical field. ... static int ignoreblks ... which may be \n or EOF ... Skipblks returns the char that getc will next return, ...
    (comp.lang.c)
  • Re: K&R 1.5.1 exercise
    ... Your while loop will eventually cause integer overflow, ... >> int main ... As EOF is a constant negative integer, you can start at INT_MIN and ... >>getcharcall will return the code of the first character you have typed ...
    (comp.lang.c)