Re: Why is it dangerous?
- From: Paul Hsieh <websnarf@xxxxxxxxx>
- Date: Tue, 12 Aug 2008 17:44:31 -0700 (PDT)
On Aug 11, 1:38 am, James Dow Allen <jdallen2...@xxxxxxxxx> wrote:
On Aug 10, 6:25 pm, Antoninus Twink <nos...@xxxxxxxxxxxxxx> wrote:
...a_03.c:(.text+0x4d): warning: the `gets' function is dangerous
and should [never] be used.
Of course, this is nonsense. There is a perfectly safe way to use
gets(), namely by being in control of what appears on stdin.
Heresy! I'm surprised no one launched a diatribe here
against Mr. Twink, so let me offer a diatribe in support!
Six comments on gets().
First, some history. (Some c.l.c'ers weren't
even alive at the time of the infamous Internet
Worm.)
I was just learning the C language at the time. I was surprised to
learn that C had such weak string handling, but the advocates kept
insisting that it was a more powerful language than Pascal. Back
then, ideological indoctrinations still had some sort of effect on me.
The infamous worm was complex enough to attempt
exploitation of at least 4 different security loopholes,
but just one of the loopholes was ubiquitous enough
to make it necessary and sufficient for the Worm's
"success." That loophole was the dangerous use
of gets() in a program (fingerd) usually run with
superuser authority. It was the exploits of the
Internet Worm that led to the deprecations against
gets(). (The exploiter, IIRC, wasn't a larcenous
"black hat", but rather a "gray hat" who deliberately
aroused the Unix community from its apathy about
such bugs.)
The fingerd->gets() exploit was not trivial.
Well by today's standards it is. Exploiting gets() on an auto buffer
would probably be considered a minimum ability for any black hat
operating today. Even as a non-exploiter kind of programmer myself, I
know exactly how this is done and why and how it works.
The overrun buffer was an automatic variable just
below a procedure frame, whose return-address was
modified to point to executable code within the overrun
buffer. That code loaded and ran another program
(misnamed 'sh' or 'csh') which, among other things,
executed the 'fake finger' program to exploit the
fingerd->gets() bug on still other machines.
The detailed steps of this exploit get elided in the
retelling,
Wikipedia has most of what you need to know and pointers to a detailed
explanation.
[...] and programmers are left with the take-home
lesson: use gets() and the Russian mob will take
over your machine and the rest of the world.
Uh ... the Russian mob *does* use hackers for these sorts of things to
control machines to make money through spam and other sorts of fraud.
The gets(auto) exploit's primary realistic risk is in fact *exactly
that*.
One doesn't have to be a gets() enthusiast to note
fingerd's special nature, and that the claim that
all gets() usage risks catastrophe is confused.
fingerd's special nature is that it used an auto buffer to accept
input and it used gets(). This same risk and failure has been seen
multiple times in various incarnations. gets() just happens to have
the unusual property that you *cannot* avoid a problem when you use
it. I.e., you cannot code around it.
Second, a confession.
Whenever I build the Index to the Fabulous Pedigree
http://fabpedigree.com/altix.htm
I do several hundred thousand gets()'s, but none of
them are "dangerous".
That's because you have an external specification for your input that
limits its size (and your compiler has an optimistic implementation of
gets). The point about gets() is that it was enshrined by a standard
that omitted any standard on what could or could not be input. While
its easy to see why such a standard is silly, it was hard for the
standards committee to see that this leads to a problem with gets().
And for you, you just go ahead and set a standard for input; you know,
the one I just called silly.
[...] I live with a few "dangerous"
messages during the build (although I'm sure the pedants
would prefer that each of the several hundred thousand
gets()'s produced its own such message. :-)
Perhaps there's a way to disable gcc's "dangerous"
message but, in keeping with the FSF philosophy, I'm
sure the cure is worse than the disease, something like
setenv IM_AN_UNREPENTANT_MORONIC_***
In what sense is removing this message worth it? Why not just call
fgets() and nail the trailing \n?
Third, a boast:
Since another of my "eccentric" codings, in private
throw-away code, is to *not* test malloc()'s return for
zero (failure leads to a core dump, which is what I want
anyway(*)), I'm sure that many in this ng believe that
James Dow Allen's code is buggy!
Why would I listen to your confessions to determine that? I have
looked at your code (on your website). Buggy is not the right word
for it. *Narrow* is more what I would use.
[...] I do not believe
this is the case. When I was rehired after a year to
add new support to a complete OS I wrote as a contractor
I was pleased to note that no changes had been found
necessary to my delivered code. Code reliability
doesn't require ingenuity (indeed the two may be
inversely related!); it requires conscientiousness
and avoiding the cheap substitution of dogma for thought.
Or perhaps an undemanding audience? Can I obtain your OS and run the
Firefox browser on top of it?
AFAIK, I've never used gets() in code I've delivered
to a customer. (This is partly because most of my delivered
code has been OS or standalone, with any stdio library
calls unavailable.) I do use gets() sometimes, on
private code, when the gets()'ed string was itself
machine-produced. The gets() buffer is usually at least
ten times as large as the longest machine-produced
string. The executables are protected from the
Internet by an Impenetrable Firewall. If someone does
break into my house, intending computer mischief,
I'd be surprised if his mischief needed to invoke gets().
The gets() deprecators aren't wrong; indeed I'll cheerfully
concede that their position is more defensible than mine!
But I'm happy to take a Devil's Advocate position to
encourage critical thinking when I see the preposterous and
dogmatic over-generalizations which become so routine in
this ng. Is gets() a *potential* source of bugs? Obviously.
But I'd love to organize a wager, between me and one of
the pedants, on whose code contains more *actual* bugs.
I'm sure you could peg me as the worst of the gets "pedants" though I
don't post much here any more. Richard Heathfield takes a strong
principled position against it, but he only scratches the surface of
what is obvious about gets(), then invokes his judgmental nature to
say its evil. I honestly take the position that gets() should, as a
side effect, attempt to delete the source file that contained it the
moment it is run. Instructing programmers that gets() is dangerous is
pointless if we just put it into documents, or compiler warnings or if
you laugh at someone for using it. Modern programmers don't learn
that way.
I have had my code actually tested against other OS developers and
compared fairly well. So I am up for it. How do you propose we do
this comparison and what would we wager?
* - Detractors will argue that what I *should* want to
do is spend hours writing a diagnostic for such malloc()
failures!
Or we might argue that you are misusing your time by not doing things
with a safer programming model in the first place. For example, your
whole "family tree" project -- you did the whole thing in C didn't
you? And you chose C because it was convenient to what you know, and
you think its actually tighter or something; even though you are using
massively over-sized buffers to hold those strings. And how do you
deal with the growing list of tree nodes? Do you painstakingly
allocate each one, then free them all? Of course you do, you revere
K&R as if they were gods. That's a job for Python if ever I saw one.
Even if you do it in C, you are going to want to write a special tree
node allocation pool layer and use a string library.
[...] In fact I don't want to do anything about them
since the smallish malloc()'s I use to build the website
Aren't Going To Fail(tm). (The pedants will respond to
this with some nonsense about how the website building
may be ported, some day, to the limited-memory chip
inside my car's fuel injection system !)
No, its like your hash table. Your program can only ever support one
of them, so its flaw is more fundamental. It should be more of a "what
if I wanted to deploy a million such websites simultaneously" issue.
Fourth, a peeve:
fgets() preserves LineFeeds, gets() discards them.
Either behavior is fine (an application whose
stringency requires special treatment of an
"unterminated last line" probably will avoid
fgets() for other reasons anyway), but similarly-named
functions *SHOULD BEHAVE SIMILARLY*.
Assuming gets() came first and it was too late to
redefine it, fgets() should have either handled LineFeeds
the same, or have been given an obviously different name.
Or you could have code that did trimming. Because that's what
everyone does anyways.
Whoever created the disparity in these similarly-named
functions should have done to him what Jesse J. secretly
claimed to want to do to Obama.
I *might* have changed from gets() to fgets() on some
of my private code if it weren't for the above nit.
(And yes, I *do* know how to do
if (*s == '\n') *s = 0;
in C.)
Fifth, an oft-overlooked truism:
Programming (and much real-world activity) involves
compromise between thoroughness and convenience.
I always thought it was: speed, correctness, minimal development-time,
pick any two. Personally, I try to deal with correctness by using
cookie cutter patterns. But this involves developing a lot of
patterns and building up libraries that let you escape from C's
nonsense. So I am delivering on quicker development-time by reusing
development time from prior projects.
Giving up on thoroughness is basically never worth it.
strncpy(), for example, can do everything(*) strcpy()
can do, *except*, when properly coded, overrun a buffer.
In other words, the *only* reason to ever use strcpy()
(besides deliberately creating a security loophole!)
is the convenience of a 2-argument function call compared
with a 3-argument call. (* -- yes, strcpy() doesn't
null-pad. Any c.l.c'er ever write code that relied
on the *non*-padding?)
As long as you think the debate is between one K&R function and
another, you cannot escape the trap they leave you in.
Thoroughness is not wrong, *BUT YOU SHOULD SPEND YOUR
THOROUGHNESS WISELY*. The original Hubbell Telescope
program spent $10,000 studying whether or not to do a
$3 Million test. Meanwhile the flaw, that showed up
post-launch, could have been found with a simple $50 test.
I'll bet some engineer would have done the $50 test
if not dizzied by the testing paperwork requirements
dictated by pedants.
That's the wrong lesson. The right lesson: the only way to catch
everything via a test, is to test everything. The most complicated
things tend to get the most attention, meaning they tend not to be as
high a risk for failure as you intuitively think.
Finally, let's note that programming and lawyerism
are different crafts.
The Authorities(tm) who post so pedantically in this
ng are often not completely wrong, but their pretentious
comments about gets() show confused thinking. In
particular, I wonder if some of them are law school
dropouts.
But you draw a false dichotomy. Remember the standard *ENDORSED*
gets() for a long time, and all the official drafts continue to
endorse it. The rationale gave a BS explanation for why it is there.
They also continue to support idiotic functions like strtok(). The
*lawyers*, like you technically have to stand behind some supposed
proper usage of these functions.
To really get an objection to gets() you have to go to the real world
programmers, not the lawyers. It is only through repeated
embarrassment of the lawyers have we gotten them to deprecate it from
the standard (only 20 years after the Morris Worm.)
Doug Gwyn telling the story of how he desperately was trying to save
gets() by making the library into even more of an abomination is
actually quite amusing in a sad kind of way. Somebody on that
committee must have had some shred of decency left in him/her
sufficient to the task of dropping the axe on this blight. But they
cannot erase the 20 year stain it has left.
When I mention the gets()'s that I use, in private,
behind my Impenetrable Firewall(tm), on strings generated
by my own Bugfree Software(tm), they never acknowledge
that some gets()'s are less dangerous than others
but instead reject "safe" usages of gets() based on
pipe(fd);
dup2(fd[0], 0);
write(fd[1], "Hello world\n", 13);
printf("%s\n", gets(buff));
on grounds that the semantics of pipe(), etc. are *not
guaranteed* by the C Standard(tm).
So you copy as a side effect of IO, rather than making two copies
yourself. Where did this number 13 come from and what happens when
you decide to correct the grammar mistakes in your string some years
later, but forget where the 13 came from? See you can't avoid the
embarrassment.
If anyone has trouble understanding the absurdity and
hypocrisy of this legalistic view, I refer them to answers
previously given, here in the ng.
The "legalistic view" is to support the usage of gets() (like you are
doing) until the new standard is published.
--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/
.
- Follow-Ups:
- Re: Why is it dangerous?
- From: Richard Heathfield
- Re: Why is it dangerous?
- References:
- Why is it dangerous?
- From: Julian
- Re: Why is it dangerous?
- From: Richard Heathfield
- Re: Why is it dangerous?
- From: Antoninus Twink
- Re: Why is it dangerous?
- From: James Dow Allen
- Why is it dangerous?
- Prev by Date: Re: How to write to a file including full directory in C under Unix?
- Next by Date: Re: How to write to a file including full directory in C under Unix?
- Previous by thread: Re: Why is it dangerous?
- Next by thread: Re: Why is it dangerous?
- Index(es):