Re: slurping in binary data



On Thu, 20 Nov 2008 02:13:43 GMT, James Kuyper wrote:

George wrote:
On Tue, 18 Nov 2008 12:26:05 GMT, James Kuyper wrote:

George wrote:
On Tue, 18 Nov 2008 00:59:20 -0800 (PST), Nick Keighley wrote:
...
assuming you mean the twenty lines at the beginning. Would fgets()
followed by

int line_num;
char data[32];

fscanf (line, "%d %32s", &line_num, data);

do the job?
Why fgets before scanf?

Key point to keep in mind here: I was thinking of sscanf(), not fscanf()
(or scanf()). The fgets()/sscanf() combo is the best way I know of to
read most text-format files.

I hadn't even disambiguated these. Am I correct that

scanf
sscanf
fscanf

are the only ones that look like another?


Because scanf() treats newline characters the same way as any other
whitespace character. This is usually not the way they should be
handled. As a result, a single incorrectly formatted line can cause all
following lines to be handled incorrectly, causing bugs that can be a
real pain to track down.

How does fgets know to stop?

It stops at the first newline or when the buffer you've provided it is
full, or at the end of the file, whichever comes first. The key point is
the "newline" - that's what makes this approach more robust when reading
line-oriented files.


ok

Presumably, we want to start scanf'ing with '1'. Let me refresh you memory
of the data set. I call it george.txt to reflect my pseudonym.

1 0001000000000000001
2 0001000000000000001
3 10000011001000000000000001
4 10000011001000000000000001
5 10000011001000000000000001
6 10000011001000000000000001
7 10000011001000000000000001
8 10000011001000000000000001
9 10000011001000000000000001
10 10000011001000000000000001
11 100000001111100
12 100000001111100
13 100000001111100
14 1000001110110111100000000000001
15 1000001110110111100000000000001
16 1000001110110111100000000000001
17 1000001110110111100000000000001
18 1000001110110111100000000000001
19 1000001110110111100000000000001
20 0001000000000000001

It's 20 by forty. Given that I premise that the first datum is the line
number, do I still have to fgets?

If your input file is perfectly formatted, and your program is correctly
written, there's no need. However, I think it's poor design to write
code that fails catastrophically when given incorrect inputs. I believe
in designing programs so they fail gracefully when given bad input. That
means that they fail without undefined behavior, and with an informative
error message, if possible. It's a lot harder to achieve that goal with
fscanf() than it is with fgets()/sscanf().

What happens if the line number is missing from, for example, line 11?
With fscanf(), it will try to interpret 100000001111100 as a decimal
integer, and store it into the line number (with undefined behavior
unless INT_MAX is larger than that value), and then put "12" into the
data buffer. fscanf() will return a value of 2, indicating a successful
read, because it has no way of noticing that anything went wrong. With
fgets()/sscanf(), you can check whether sscanf()==2; if it does not, you
immediately know there's a problem with the line.

Continuing processing despite a problem like that can be pointless, or
mandatory, or anywhere in between those two extremes, depending upon
your application. If you keep using fscanf(), it would attempt to read
100000001111100 as the line number and put "13" into the data buffer; it
will stay out of sync with the actual lines until the end of the file,
or the next incorrectly formatted line, whichever comes first.

With fgets()/sscanf(), fgets() will start cleanly at the next line, so
sscanf() can do exactly what you need it to do; the combination of those
two functions won't stay out of sync with the data, the way fscanf() would.

#include <stdio.h>
#include <stdlib.h>

#define PATH "george.txt"
#define NUMBER 100
#define BIN 1000
#define MAXFMTLEN 2000

int main(void)
{
FILE *fp;
char pattern[MAXFMTLEN];
char lnumber[NUMBER];
char lbin[BIN];
char line[MAXFMTLEN];

if ((fp = fopen(PATH, "r")) == NULL ) {
fprintf(stderr, "can't open file\n");
exit(1);
}

sprintf(pattern, "%%%ds %%%ds", BIN-1, NUMBER-1);

while ((fgets(line, MAXFMTLEN, fp)) != NULL ) {
sscanf(line, pattern , lnumber, lbin);
/*fscanf (fp, "%d %32s", &lnumber, lbin);*/
printf("%s\n", lbin);
}


Q1) Does the while control satisfy your critism above?

Q2) Why doesn't the sprintf have to *follow* the while?

whitespace crlf
whitespace crlf
1 0001000000000000001
2 0001000000000000001
3 10000011001000000000000001
4 10000011001000000000000001
--
George

If you're sick and tired of the politics of cynicism and polls and
principles, come and join this campaign.
George W. Bush

Picture of the Day http://apod.nasa.gov/apod/
.



Relevant Pages

  • Re: slurping in binary data
    ... How does fgets know to stop? ... That means that they fail without undefined behavior, and with an informative error message, if possible. ... It's a lot harder to achieve that goal with fscanf() than it is with fgets/sscanf. ... With fscanf, it will try to interpret 100000001111100 as a decimal integer, and store it into the line number, and then put "12" into the data buffer. ...
    (comp.lang.c)
  • Re: slurping in binary data
    ... Key point to keep in mind here: I was thinking of sscanf, not fscanf() ... How does fgets know to stop? ... in designing programs so they fail gracefully when given bad input. ... 100000001111100 as the line number and put "13" into the data buffer; ...
    (comp.lang.c)
  • Re: gets() is dead
    ... That certainly means I am unable to use fgets() safely, ... char *readstr ... tmp = realloc; ... strcpy(input, buffer); ...
    (comp.lang.c)
  • Re: Is this string input function safe?
    ... return a pointer to mallocated memory holding one input string, ... See my comment after your call to fgets. ... char* malloc_getstr ... before any characters are read, then the ...
    (comp.lang.c)
  • Re: fgets behaviour with strncmp
    ... given that all the characters returned by fgets are guaranteed ... array of unsigned char. ... unsigned char without a conversion taking place. ...
    (comp.lang.c)