Re: simple file compression program



In article <c3faab3c-8d4b-4720-bfd4-6affa6b2bd96@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
sophia <sophia.agnes@xxxxxxxxx> wrote:

the following is the file compression program ,using elimination of
spaces, which I saw in a book

#include<stdio.h>
#include<stdlib.h>

int main(int argc,char * argv[])
{

FILE* fs,*ft;

fs = fopen(argv[1],"r");
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[1]);

You are not outputing a \n as the last character. It is
implementation defined at to whether the last output line will
appear in such a case (and it is also possible that it will appear
but then be immediately overwritten by the next shell prompt, making
it seem that it did not appear.)

Error messages are better output to stderr.

exit(1);

exit(1) does not have a defined effect. The arguments
with defined meaning are 0, EXIT_SUCCESS and EXIT_FAILURE

}

ft = fopen(argv[2],"w");
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[2]);
exit(1);
}

while( (ch=fgetc(fs)) != EOF)

You have not declared ch by this point. The exact definition of ch
is important to the program. For example, if it were declared as
'char' and 'char' happened to be unsigned on that system, then
it would not be possible for ch to compare equal to EOF, which is
always negative.

{

if(ch == 32)

What is 32? If you mean a space, code a space, ' ' . The numerical
values of particular characters are not specified in C.

{
if( (ch=fgetc(fs)) != EOF)
fputc(ch+127,ft);

As the character set representation is not specified by C, it
is possible that ch+127 is a valid character in the character set.

If the file ends in a 32 then that trailing 32 will be lost with
your logic.

I note that you do not open the file in binary mode. It could
happen that in the input, there were often space characters immediately
proceeding end-of-line indicators. The end of line indicators would
be read as '\n' and that '\n' would be transformed by your compressor
to '\n'+127 which is unlikely to be an end of line indicator. You
could thus end up with output lines that exceeded the maximum text
output line size supported by the implementation. You could also
potentially happen upon characters for which the character + 127
came out as '\n', thus introducing an end of line where there was none
before.

}
else
fputc(ch,ft);

}

fclose(fs);
fclose(ft);

return EXIT_SUCCESS;
}


Now my questions are as as follows

1) Is there any other simpler method to compress text files, similar
to the above program(Other than standard algorithms like huffman,LZW)

Yes, many of them, most equally inefficient. The code you give at
best compresses space followed by a character to a different character
code, and leaves everything else alone -- it doesn't even try to
compress runs of spaces into something more efficient. If the code
were to be applied to typical English text, it would produce a
more efficient output if, instead of compressing spaces, it compressed
'e', 't', 'a', 'i', 'o', or 'n', all of which occur in English text
with greater frequency than space does.
--
"The whole history of civilization is strewn with creeds and
institutions which were invaluable at first, and deadly
afterwards." -- Walter Bagehot
.



Relevant Pages

  • Re: zernike moments question
    ... Not sure how you planned to compress your pixel grid. ... >>> I have to calculate the zernike moments (http://homepages.inf.ed.ac.uk/ ... background of a character may as well be made up. ...
    (sci.image.processing)
  • Re: Hutter Prize Rules
    ... will be able to compress the original 18,324,887 bytes by more than an ... at least 1-bit per character characterization of the data which has oft been ... If paq8hp1 is validated we will have 3% ... improvement just since the announcement of the Hutter Prize. ...
    (comp.compression)
  • Re: Skill Challenges - WTF?
    ... Mere moments before death, Allen Wessels ... perhaps (the character has to win enough ... points in negotiation to get the deal, before talks fall apart), but ... I guess it's a matter of how much you want to compress the event. ...
    (rec.games.frp.dnd)
  • Re: zernike moments question
    ... an image processing algorithm. ... compress my pixel grid to fit in a circular shape with an edge radius ... Im inputting the zernike moments into a neural net for character ...
    (sci.image.processing)
  • Re: detab utility challenge.
    ... it is possible that some valid character may ... I'd guess the answer is the same, use feof() to verify an eof ... no. Undefined Behaviour is a technical term from the standard. ... Any program written in any language can contain mistakes. ...
    (comp.lang.c)