Re: slurping in binary data
- From: George <george@xxxxxxxxxxxxxxx>
- Date: Fri, 21 Nov 2008 16:27:25 -0700
On Thu, 20 Nov 2008 02:13:43 GMT, James Kuyper wrote:
George wrote:
On Tue, 18 Nov 2008 12:26:05 GMT, James Kuyper wrote:
George wrote:
On Tue, 18 Nov 2008 00:59:20 -0800 (PST), Nick Keighley wrote:...
assuming you mean the twenty lines at the beginning. Would fgets()Why fgets before scanf?
followed by
int line_num;
char data[32];
fscanf (line, "%d %32s", &line_num, data);
do the job?
Key point to keep in mind here: I was thinking of sscanf(), not fscanf()
(or scanf()). The fgets()/sscanf() combo is the best way I know of to
read most text-format files.
I hadn't even disambiguated these. Am I correct that
scanf
sscanf
fscanf
are the only ones that look like another?
Because scanf() treats newline characters the same way as any other
whitespace character. This is usually not the way they should be
handled. As a result, a single incorrectly formatted line can cause all
following lines to be handled incorrectly, causing bugs that can be a
real pain to track down.
How does fgets know to stop?
It stops at the first newline or when the buffer you've provided it is
full, or at the end of the file, whichever comes first. The key point is
the "newline" - that's what makes this approach more robust when reading
line-oriented files.
ok
Presumably, we want to start scanf'ing with '1'. Let me refresh you memory
of the data set. I call it george.txt to reflect my pseudonym.
1 0001000000000000001
2 0001000000000000001
3 10000011001000000000000001
4 10000011001000000000000001
5 10000011001000000000000001
6 10000011001000000000000001
7 10000011001000000000000001
8 10000011001000000000000001
9 10000011001000000000000001
10 10000011001000000000000001
11 100000001111100
12 100000001111100
13 100000001111100
14 1000001110110111100000000000001
15 1000001110110111100000000000001
16 1000001110110111100000000000001
17 1000001110110111100000000000001
18 1000001110110111100000000000001
19 1000001110110111100000000000001
20 0001000000000000001
It's 20 by forty. Given that I premise that the first datum is the line
number, do I still have to fgets?
If your input file is perfectly formatted, and your program is correctly
written, there's no need. However, I think it's poor design to write
code that fails catastrophically when given incorrect inputs. I believe
in designing programs so they fail gracefully when given bad input. That
means that they fail without undefined behavior, and with an informative
error message, if possible. It's a lot harder to achieve that goal with
fscanf() than it is with fgets()/sscanf().
What happens if the line number is missing from, for example, line 11?
With fscanf(), it will try to interpret 100000001111100 as a decimal
integer, and store it into the line number (with undefined behavior
unless INT_MAX is larger than that value), and then put "12" into the
data buffer. fscanf() will return a value of 2, indicating a successful
read, because it has no way of noticing that anything went wrong. With
fgets()/sscanf(), you can check whether sscanf()==2; if it does not, you
immediately know there's a problem with the line.
Continuing processing despite a problem like that can be pointless, or
mandatory, or anywhere in between those two extremes, depending upon
your application. If you keep using fscanf(), it would attempt to read
100000001111100 as the line number and put "13" into the data buffer; it
will stay out of sync with the actual lines until the end of the file,
or the next incorrectly formatted line, whichever comes first.
With fgets()/sscanf(), fgets() will start cleanly at the next line, so
sscanf() can do exactly what you need it to do; the combination of those
two functions won't stay out of sync with the data, the way fscanf() would.
#include <stdio.h>
#include <stdlib.h>
#define PATH "george.txt"
#define NUMBER 100
#define BIN 1000
#define MAXFMTLEN 2000
int main(void)
{
FILE *fp;
char pattern[MAXFMTLEN];
char lnumber[NUMBER];
char lbin[BIN];
char line[MAXFMTLEN];
if ((fp = fopen(PATH, "r")) == NULL ) {
fprintf(stderr, "can't open file\n");
exit(1);
}
sprintf(pattern, "%%%ds %%%ds", BIN-1, NUMBER-1);
while ((fgets(line, MAXFMTLEN, fp)) != NULL ) {
sscanf(line, pattern , lnumber, lbin);
/*fscanf (fp, "%d %32s", &lnumber, lbin);*/
printf("%s\n", lbin);
}
Q1) Does the while control satisfy your critism above?
Q2) Why doesn't the sprintf have to *follow* the while?
whitespace crlf
whitespace crlf
1 0001000000000000001
2 0001000000000000001
3 10000011001000000000000001
4 10000011001000000000000001
--
George
If you're sick and tired of the politics of cynicism and polls and
principles, come and join this campaign.
George W. Bush
Picture of the Day http://apod.nasa.gov/apod/
.
- Follow-Ups:
- Re: slurping in binary data
- From: Nick Keighley
- Re: slurping in binary data
- References:
- slurping in binary data
- From: George
- Re: slurping in binary data
- From: Nick Keighley
- Re: slurping in binary data
- From: George
- Re: slurping in binary data
- From: James Kuyper
- Re: slurping in binary data
- From: George
- Re: slurping in binary data
- From: James Kuyper
- slurping in binary data
- Prev by Date: Re: Hooks problem
- Next by Date: Re: (part 33) Han from China answers your C questions
- Previous by thread: Re: slurping in binary data
- Next by thread: Re: slurping in binary data
- Index(es):
Relevant Pages
|