Re: The infamous ^Z problem




"Keith Thompson" <kst-u@xxxxxxx> wrote in message
news:ln646kgsyc.fsf@xxxxxxxxxxxxxxxxxx
"Eigenvector" <m44_master@xxxxxxxxx> writes:
I've been surfing the FAQ and Google for about a week and haven't
quite figured out this one.

I have a file that changes on a periodic basis and every once and a
while ^Zs will appear in the file for reasons I don't want to get
into. I need to get rid of those ^Z's and need to do it via a C code
as it is the only tool available to me that can handle the file size.

So I cooked up some code, tried it out on one platform - and it works
great, it doesn't work so great on another and I am trying to
understand why. I did my best to code standard but perhaps that is
where I'm failing.

#include <stdio.h>
int main(int argc, char *argv[])
{
FILE *infile, *outfile;
int c; /*picked that up from the FAQ */
if ( (infile = fopen(argv[1], "rb") == NULL) /*picked the binary
part up from this google group*/

What if argv[1] doesn't exist? Check the value of argc.

{
printf("Cannot open file\n");

Error messages are traditionally written to stderr rather than stdout.

exit(1);

The only portable values for the argument to exit() are 0,
EXIT_SUCCESS, and EXIT_FAILURE. In this case, I'd recommend using
EXIT_FAILURE, which would also force you to add "#include <stdlib.h>"
(which is required for the exit() function anyway).

}
if ( (outfile = fopen("Clean_file", "w+")) == NULL)

You opened the input file in binary mode, "rb", which seems correct,
but you opened the output file in text mode *and* update mode, even
though you only write to it. For consistency, use "wb" (write-only,
binary mode).

I won't argue the advantages of reading and writing cleanly, although you
are certainly correct here. I'm just trying to pound out something that
will work - more concept than production code. Although I will take your
suggestions to heart.


{
printf("Cannot open output file\n");
exit(1);
}
while ((c=fgetc(infile)) != EOF )
{
if(c == 0x1a) /* This is where I'm having a problem */
/* if(c == '\0x1a') This fails with compiler error - more than
one character defined for type char */

0x1a should work. '\x1a' is equivalent and probably clearer.

Okay, I see now where I went wrong. \0 is for octal representation than
hex. Let me go back and try the '\x1a` and see if I do better.



(A compiler *could* accept '\0x1a', but it does't mean what you think
it means. The \0 represents a null character, and it's followed by
characters 'x', '1', and 'a'. Multi-character character literals are
legal, but their meaning is implementation-defined; they're hardly
ever useful.)

{
c='_'; /*replace bad control char with something innocuous */
}
fputs(c,outfile);
}
fclose(infile);
fclose(outfile);

"return 0;" or "exit(0);".

}
Yeah it's a pretty primitive code, but I'm more interested in getting
the basics working before I go in and optimize the way it handles the
input file. This compiles on xlC and HP's ANSI C compilers.

<OT>Since you're using Unix-like systems, "man tr".</OT>

Actually `tr` absolutely doesn't work here, the ^Z is its death (same with
sed, batch VI, and a host of other shell related commands), but I won't
discuss that here. Besides I will at some point need to port this to
Windoze.


In the first if statement dealing with the ^Z, the program doesn't
detect the control characters in the file,

I don't know why it would cause that problem. I suspect you may be
misinterpreting the symptoms, but it's hard to tell.

Agreed it's hard to diagnose code over a newsgroup. In the code I have
working I put a puts() statement in the if branch to output whenever the
conditional was met. When I use the 0x1a notation the if conditional is
never accessed, although the program completes normally.


in the second statement the
compiler complains about syntax. If I set c as typecast char, it
finds the control characters, replaces them, but then blows away the
EOF character and nukes the file.

I have the suspicion that its the way I'm defining the c==\0x1a that
is leading my astray here. I can't find any good consistent
documentation on exactly how to represent hex or octal in c code or
string/character operations.

Really? Any decent C reference should explain that. If nothing else,
you can get the latest draft of the C standard at
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf>; see
sections 6.4.4.4 and 6.4.5.

--
Keith Thompson (The_Other_Keith) kst-u@xxxxxxx
<http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*>
<http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"


.


Quantcast