Re: reading text files

From: Dave Thompson (david.thompson1_at_worldnet.att.net)
Date: 12/20/04


Date: Mon, 20 Dec 2004 06:28:53 GMT

On Fri, 10 Dec 2004 02:54:02 -0500, "EkteGjetost"
<cheesemaker@gmail.com> wrote:

> I would like to first apologize to those of you who read my last post
> "desperately need help". As a regular on other forums i can understand how
> aggravating it would be to have someone come on who obviously doesn't know
> the community and asks for people to do their work for them.
>
> So i've come much more prepared this time.
>
> What my problem is, is that i need to write a program that will count the
> number of alphabetic characters, numbers, punctuation marks, and spaces
> from a text file.
>
As others have noted, the "best" solution, for common values of
"best", is to process the file once, keeping all four counts at the
same time. But even for the one-at-a-time approach you have, which may
be preferable or at least reasonable in some more complicated
situations, you have some pretty basic problems.

Since enough time has passed that this probably can't be homework --
and you're unusually polite -- I'll explain more completely.

> Here's what i've done so far.
>
> #include <stdio.h>
> #include <ctype.h>
>
> void countAlpha (FILE *infile, FILE *outfile, char alphabet);
> void countDigit (FILE *infile, FILE *outfile, char numbers);
> void countPunct (FILE *infile, FILE *outfile, char punctuation;
> void countSpace (FILE *infile, FILE *outfile, char spaces);
>
See below about the third parameter to these functions ...

> int main()
> {
> FILE *infile;
> FILE *outfile;
> char alphabet = 0;
> char numbers = 0;
> char punctuation = 0;
> char spaces = 0;
>
... and these variables.

> infile = fopen( "input.txt", "r");
> if(infile == NULL)
> {
> printf("Cannot read input file: input.txt\n");
> return 100;
> }
> outfile = fopen( "output.txt", "w");
> if(outfile == NULL)
> {
> printf("Cannot open outputfile: output.txt\n");
> return 100;
> }
>
A process exit status of 100 is not portable; standard C provides only
zero, and EXIT_SUCCESS and EXIT_FAILURE from stdlib.h. Even on the
many systems where 0 to 255 works, 100 is an unusual value to choose.
I would suggest you use EXIT_FAILURE when posting here, just to avoid
unnecessarily repeated discussion of the issue, and if you want change
it to some other value on your own system(s).

> countAlpha(infile, outfile, alphabet);
> countDigit(infile, outfile, numbers);
> countPunct(infile, outfile, punctuation);
> countSpace(infile, outfile, spaces);
> return 0;

While the C runtime will fclose() all fopen'ed files for you, some
people, including me, consider it better to do so explicitly. This
also allows you to check for some errors, which for output files
especially don't "appear" until close, although in this case there
isn't much you could reasonably do if you do detect an error.

> }
>
> void countAlpha (FILE *infile, FILE *outfile, char alphabet)
> {

It is not necessary for 'alphabet' to be a parameter passed from the
caller -- the caller's value is not used, nor needed, for anything --
and is actively misleading. A local variable is better.

> fscanf(infile, "%c", alphabet);
>
The 3rd-and-up arguments to fscanf (and sscanf, and 2nd-and-up to
scanf) must be pointers; this passes and uses at best a completely
wrong pointer and quite possibly isn't even a working call.

If you made it fscanf (infile, "%c", &alphabet) it would be legal, but
except for errors, which you don't handle anyway, equivalent to
  alphabet = fgetc /* or getc */ (infile);
which is more specific and thus I think clearer.

> while(isalpha(alphabet));
> alphabet=getchar();
> // i'm pretty sure this while loop is where the problem is
>
It sure is. First, you've already been told that
  while (condition) ; /* dubious semicolon here */
is an empty loop -- it evaluates the condition; if true, it executes
an empty body and evaluates the condition again; et cetera. If as in
this case the condition has no side effects, if true the first time it
is still true every subsequent time and this is an infinite loop.

Even if you changed it to:
  while( isalpha(alphabet) ) /* no semicolon! */
    alphabet = getchar();
it tries to read from stdin not your selected input file; fix that and
  while( isalpha(alphabet) )
    alphabet = fgetc (infile);
is wrong logic: this counts the number of _consecutive_ alphabetic
characters at the beginning of the input (file). Plus, depending on
whether 'plain' char is signed on your system, it may malfunction when
it reaches end-of-file, (only) if the input is entirely alphabetic.

> fprintf(outfile, "Alphabetic Characters: %c\n", alphabet);
>
Even if your loop above was correct, this would simply print the first
character encountered that is not alphabetic.

What you want is to read _every_ character from the file; count how
many are of the particular type(s) you are looking for; and then print
that _count_ (or those counts).

  int c = fgetc (infile);
  /* note not char; the return value of fgetc, getc, or getchar has
    an "extended" range: EITHER an unsigned char value, OR
    the value EOF which is a negative int usually -1 */
  int n = 0;
  /* or unsigned, or maybe long or unsigned long depending on
    how much input you want/need to handle */
  while( c != EOF ) {
    if( isalpha(c) )
      ++n; /* or n += 1 or n = n + 1 if you prefer */
    /* could do other types in parallel here */
    c = fgetc (infile);
  }
  fprintf (outfile, "count is %d\n", n); /* or %u %ld %lu */

or you can put the getchar() call (once) within the condition:
  int c;
  int /* or whatever */ n = 0;
  while( (c = fgetc (infile)) != EOF )
    if( isalpha(c) ) ++n;

or if you really want you can use fscanf, but check the result:
  char c; /* not int; now the exception case is handled differently */
  int /* or whatever */ n = 0;
  while( fscanf (infile, "%c", &c) == 1 )
    if( isalpha(c) ) ++n;

> and i just repeated the same things basically for each function after
> that.
>
> When i try to run this i get an error before what seems like anything else
> happens.

- David.Thompson1 at worldnet.att.net



Relevant Pages

  • Re: Q: password generation
    ... because the larger the alphabet gets, the less is the effect of adding ... more characters to it. ... four lowercase letters each. ... Given some hundreds of passwords made according to them, ...
    (sci.crypt)
  • Re: Q: password generation
    ... because the larger the alphabet gets, the less is the effect of adding ... more characters to it. ... four lowercase letters each. ... Given some hundreds of passwords made according to them, ...
    (sci.crypt)
  • Re: National Language - question on alphabet/sort order
    ... This has little to do with the alphabet, ... I also doubt that accented characters are used for this even when the ... really used in the language. ... The initial market study might show you than you need more flexibility, ...
    (microsoft.public.vc.mfc)
  • Re: cracking the vigenere cipher when the key is non-alphabetic
    ... the characters in it will be from a single alphabet. ... If the plaintext uses full ASCII, ...
    (sci.crypt)
  • Re: Getting a random letter.
    ... Stored Procedure or SQL Statement that creates the extra column. ... SELECT Name, Address, Phone, RANDAs SortOrder FROM Company ... loop through the rows of the DataTable, replacing the 0 in the SortOrder ... start with a random letter of the alphabet, ...
    (microsoft.public.dotnet.framework.aspnet)