Re: Linear regression in NumPy

nikie wrote:
I'm a little bit stuck with NumPy here, and neither the docs nor
trial&error seems to lead me anywhere:
I've got a set of data points (x/y-coordinates) and want to fit a
straight line through them, using LMSE linear regression. Simple
enough. I thought instead of looking up the formulas I'd just see if
there isn't a NumPy function that does exactly this. What I found was
"linear_least_squares", but I can't figure out what kind of parameters
it expects: I tried passing it my array of X-coordinates and the array
of Y-coordinates, but it complains that the first parameter should be
two-dimensional. But well, my data is 1d. I guess I could pack the X/Y
coordinates into one 2d-array, but then, what do I do with the second

Mor generally: Is there any kind of documentation that tells me what
the functions in NumPy do, and what parameters they expect, how to call
them, etc. All I found was:
"This function returns the least-squares solution of an overdetermined
system of linear equations. An optional third argument indicates the
cutoff for the range of singular values (defaults to 10-10). There are
four return values: the least-squares solution itself, the sum of the
squared residuals (i.e. the quantity minimized by the solution), the
rank of the matrix a, and the singular values of a in descending
It doesn't even mention what the parameters "a" and "b" are for...

Look at the docstring. (Note: I am using the current version of numpy from SVN,
you may be using an older version of Numeric.

In [171]: numpy.linalg.lstsq?
Type: function
Base Class: <type 'function'>
String Form: <function linear_least_squares at 0x1677630>
Namespace: Interactive
Definition: numpy.linalg.lstsq(a, b, rcond=1e-10)
returns x,resids,rank,s
where x minimizes 2-norm(|b - Ax|)
resids is the sum square residuals
rank is the rank of A
s is the rank of the singular values of A in descending order

If b is a matrix then x is also a matrix with corresponding columns.
If the rank of A is less than the number of columns of A or greater than
the number of rows, then residuals will be returned as an empty array
otherwise resids = sum((b-dot(A,x)**2).
Singular values less than s[0]*rcond are treated as zero.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco