Re: Heaps and Foreigners

From: Duane Rettig (duane_at_franz.com)
Date: 05/06/04


Date: 06 May 2004 09:07:24 -0700

tfb+google@tfeb.org (Tim Bradshaw) writes:

> Duane Rettig <duane@franz.com> wrote in message news:<4isfa3hi1.fsf@franz.com>...
>
> > So however large the file is, if just a second helping of that file size
> > can be allocated for swap space, then you're covered, right? If the
> > file might grow or be filled in arbitrarily, then you're going to run
> > out of file space as well as virtual memory space, but guess which one
> > will give you the unintelligible error diagnostic?
> >
> I thought about this some more walking to work, and I realised that
> there's a tacit assumption here that running out of file space is
> `better' than rollong over and dying.

No, there's no such assumption there. Obviously, it is better _not_
to run out of file space than either to run out of file space or
rolling over and dying. But what is important in both of these failure
cases is not the fact that there is a failure (both cases have that)
but in the potential for recovery (where, going back to our original
issue of swap-unbacked memory, is impossible to do).

> It may be, but it can be
> catastrophically worse as well. Here's two examples that have
> happened to me, one in recent history, one longer ago.

 [ two excellent examples of worse-is-better C/unix programming elided ...]

> In both those cases I would much, much rather have dealt with a crash,
> or even repeated crashes.

And I would have much rather dealt with a warning that said "hey, your
disk space is getting low", _before_ the crash, or to have had the
failure reported before it turned into a crash.

> The underlying issue here is that incompetent programmers write
> programs that don't work.

Interesting. Here is a truth that is logically flawed (or at least
not in its fully reduced form. The real truth is that _all_
programmers write programs that don't work, so whether or not they
are competent is irrelevant. At this point I would also normally
say that it is the competent programmers who look for the bugs in
their programs and crush them, but under that criterion we are all
incompetent, because after all that debug effort, there are still
bugs in the program.

> It isn't hard to check return codes or
> update critical files safely, but it is too hard for these people.

I don't think it's a question of being hard, but more of being
inconvenient. In order to check whether or not you are checking all
of the return codes, especially in cases where you are close to the
edge in resource-availability, you have to configure your test cases
to _be_ at close to the edge of resource-availability. That's not
hard, just inconvenient. Those who do so tend to have more solid
a program.

> Similarly it isn't hard to set the heap limit for your application to
> physmem + swap - 1GB (if it's all the machine will run) or something
> equivalent if it's not. But it is too hard for these people.

See above. Not hard, inconvenient.

> I can see where Duane is coming from: he's an implementor and probably
> has a large number of more-or-less incompetent users who complain
> about his application because they're too incompetent to set the heap
> size limit that ACL provides. He probably never hears from the ones
> that set it correctly (`hey Duane, my program worked, thanks!').

Actually, we do. And some of the users that complain about our
application are some of the most competent ones! In fact, I see your
name in our spr database 30 times, and that's only as the primary
contact for the spr. I certainly wouldn't call you incompetent.
But in fact, many of these complaints tend to make our product better,
so I certainly don't mind them.

> And
> his users pay his salary, so killing them is not really a good option.
> Finally, he's a nice guy, he'd probably have qualms about killing
> them even if there wasn't money involved.

> But I'm a Unix systems guy: big boots, manuals used as clubs and
> semi-automatic weapons are my stock in trade. I might have been nice
> once, but that was so long ago I can't remember, frankly. I'm just
> not interested in putting up with people who can't cope with setting
> heap limits or dealing with the issues raised by overcomitted memory
> which can't be avoided: I just want to get paid. The Sig-Sauer P226 I
> keep in the tape safe deals with them effectively: it's a little messy
> afterwards, but they don't suffer.

All this talk of killing, here. How do you _really_ feel about
customers?

:-)

> There's also the underlying point that this fear of overcommit is kind
> of a medieval attitude (this isn't meant as an insult...). In the
> middle ages, people had really strict ideas about money: don't lend
> more than you have, don't lend at interest, don't make promises you
> can't keep.

Call me Sir Duane, then.

 [ examples of insurance companies and banks overcommitting ]

There are always risks when insuring or ensuring something.
These risks depend on the backing that is available, which
might invclude the probability that it might happen.

As an example, we California ski boat owners never bothered
winterizing our boats before 1990, because it never froze.
But in 1990 (I think, or it may have been 1991) we had a huge
freeze, on Christmas week pipes started bursting and had to be
fixed immediately, but we never thought any more about it until
the next spring, when we got out to the lake and our engine
compartment filled up with brown foam (the engine block, which
had still had water in it which had frozen, had cracked, thus
mixing a nice oil-and-water shake for us). We had to be towed
back to the shore. Amazingly, our insurance company was one of
two in California which payed out to have the engine fixed.
It went out of business the next year, but it had had the backing
for the disaster which it needed before it went under.

Several lessons:

 1. The insurance company, though not able to weather the financial
drain, did seem to have the backing to pay out to its customers.
I don't know if all of its customers got their engine rebuild paid
for (rebulds which half the boats in California had to do), but they
paid for ours, and thus left us cleanly taken care of ("closed", in
the NFS sense of your second example).

 2. It would have been nice not to have found out about the failure
out on the lake, where we had to receive a two. But, working backwards,
  a. People on lakes are usually happy to give others a tow, because
     they know that their turn will come
  b. The circumstances under which the failure occurred were not tested
     for by me; although I always test my engine out in dry-dock (by
     attaching a hose to the engine and running it with water running
     through the hose) it did not show the problem because the engine
     was not stressed, and so the cracked block did not leak. I.e.
     we were not close to the edge.
  c. It was sheer laziness on our part that I did not perform the
     winterizing that was required, coupled with the low risk of
     it happening. The fact that it happened once raised the risk
     level immediately.

 3. We now always winterize our boat. We learn from the failures
    how to not encounter the failures.

-- 
Duane Rettig    duane@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   


Relevant Pages

  • Re: Problems in a commercial flight
    ... if the purpose of the practice is to experience a loss of meaningful power and to execute the proper diagnostics to the engine as well as the correct emergency procedures? ... Surely an engineer would anticipate the failure of a computer system. ... Why would an engineer ignore this time-tested approach to wing design in favor of an active system? ...
    (rec.aviation.piloting)
  • Re: Lycoming crankshafts
    ... forced landing due to an engine crankshaft failure which caused the ... crankshaft bore on certain Textron Lycoming 320 and 360 reciprocating ...
    (rec.aviation.piloting)
  • Re: Problems in a commercial flight
    ... Because an idle engine is not a stopped engine, ... There are too many possible failure scenarios. ... The ones that are not anticipated in the design will ... more towards level), sans control inputs. ...
    (rec.aviation.piloting)
  • Re: Engine failure myth(?) - was Re: (ir)rational fear of old trainers...?
    ... I've only had a very few engine failures, ... Had a failure of the left engine on an F90 King Air at night ... on take-off due to fuel system problems. ... had a fuel vent to the left nacelle tank break and stop fuel ...
    (rec.aviation.student)
  • Re: Test Failure of SpaceX Merlin VTS1-221Engine
    ... >>anything to detect before the failure occurred. ... This shutdown ability is used as an argument in favor ... > time to shut down the engine before it catastrophically fails. ... system could have compensated for the thrust imbalance (by engine ...
    (sci.space.policy)