Re: statistics::linear-model question



In article <1188974953.743153.131960@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Alexandre Ferrieux <alexandre.ferrieux@xxxxxxxxx> wrote:
On Sep 5, 7:59 am, Luc Moulinier <mou...@xxxxxxxxxxxxxxxxxx> wrote:
To clarify: Taking a new X which is not in the set of points used to
define the line, and which is outside this range of points
(extrapolation), the regression line give you an estimate of Y. I know
it is possible to calculate the error associated with Y, this error
depending on x (the error is bigger if you go away along the line) and
depends also on the confidence you want. But I don't know how to
compute it ....

Then what you're after is the Y std deviation.
The best you can do with the tools is that library is a 2nd order
approximation, hence for your extrapolated Xe, the Y distribution is

N(BXe+A,Sy)

Then to aim for a given confidence level, you take the reciprocal of
the erf() function. Since erf(1.4)~0.95, your answer is roughly
1.4*Sy.

-Alex



I applaud the precision with which Alexandre, in particular, has
followed up in this thread.

As we're speculating about what the original questioner *really*
is after, I perceive a few important qualitative differences that
deserve mentioning. Given a collection of (X,Y) data, it's indeed
reasonable to experiment with a least-square regression as already
described. Very mild assumptions on the distribution of (formal)
errors in X and Y make the regression a relatively robust artifact
of analysis.

The error bounds are a more sensitive matter. Alexandre has of
course reported the pertinent calculation correctly. Note, though,
before reporting with confidence to your constituents, "I'm 83%
sure the new Y will be in *this* range" that those bound intervals
depend rather delicately on the details of the distributions of X
and Y. Prior knowledge that those will be exactly normal is ...
well, it's rarer than the managers of billion-dollar private equity
funds, for example, seem to recognize (catastrophic video available
on request).

I summarize: I'm somewhat more comfortable putting a best-fit curve
in the hands of naive consumers, than I am of doing the same with
confidence intervals.
.



Relevant Pages

  • Re: Rich Ulrich continues his statistical Muddle, Quackery, and MALPRACTICE
    ... A discrete uniform distribution of ranks. ... >> the distribution of the X used in any multiple regression. ... >> RU> Normality is only one sort ... decided to do a fitting of a particular Y on several X that had already ...
    (sci.stat.math)
  • Re: Linux 2.6.16.30-pre1
    ... the goal of -stable as I remember it is not "no regression" ... i.e. patches don't go in without a good ... Whether a distribution uses 2.6.16 or a more recent kernel (that will ... on the goals of the distribution. ...
    (Linux-Kernel)
  • Re: Rich Ulrich continues his statistical Muddle, Quackery, and MALPRACTICE
    ... >>> muddle about the standard regression assumptions, ... A discrete uniform distribution of ranks. ... why was sehwail and Richard Ulrich want to check the "normality" ...
    (sci.stat.math)
  • Re: statistics::linear-model question
    ... extrapolation of X given a regression line? ... a global measurement of the quality of the prediction, ... (extrapolation), the regression line give you an estimate of Y. I know ... Tekst uit oorspronkelijk bericht weergeven - ...
    (comp.lang.tcl)
  • Re: Regression, T Test
    ... following tests - T-test, correlation, regression, Chi-squared. ... normal/standard distribution quickly as I have to be able tomorrow to ... what test would you use to compare different locations for equality ...
    (sci.stat.math)