Re: help with statistics library



Richard Weeks <rweeks@xxxxxxxxxx> wrote:
http://members.shaw.ca/bystander/statsource.html

My first impression after a short look is that it looks rather
well-written and I so simply start with nit-picking:

1) Your always using an int for the sizes of the arrays you pass
to the functions. But since there is a size_t type introduced
for exactly this purpose why not use that? The same holds for
the loop variables used to iterate over arrays.

2) a_mean() could easily rewritten as

double a_mean(double *datalist, size_t listsize) {
return sum(datalist, listsize) / list_size;
}

A rule I learn to appreciate more and more is "Don't repeat
yourself" and by having the same code in a_mean() and sum()
that rule is violated.

3) The DBL_ISEQUAL macro defined in the header file looks fishy. You
define it as

#define DBL_ISEQUAL(a,b) ((-DBL_EPSILON)<((a)-(b))&&((a)-(b))<(DBL_EPSILON))

DBL_EPSILON is the minimum x so that 1.0 + x != 1.0, so your macro
will work for a and b having a similar order of magnitude to 1, but
if DBL_EPSILON is e.g. 1e-9 and 'a' is e.g 1.0e-23 and 'b' 1.1e-23
it will falsely flag them as equal even though 'a' and 'b' differ
by 10% and they probably shouldn't be treated as being equal. I
guess you need to scale DBL_EPSILON to match the values of 'a' and
'b'.

4) There are a few instances where you cast where it isn't necessary.
E.g. if you divide a double by an int you don't have to cast the
int to double, the compiler will do that for you. The same holds
for instances where you assign an int to a double. Of course, the
cast doesn't hurt, but as a rule I avoid casts unless they are
really necessary and thus see casts as a "red flag" that says
"here's something happening that may need re-examination".

5) In std_norm_ptile() you use the comma operator too much for my
liking. E.g.

if(k >= .01 && k <= .1) x = -2.5, limit = -1.25;

is probably better written as

if ( k >= 0.01 && k <= 0.1 )
{
x = -2.5;
limit = -1.25;
}

Obviously, it will behave exactly the same but it's probably
going to be simpler to parse for someone reading the code (and
the compiler will probably emit exactly the same machine code).

6) I guess you shouldn't export the compare() function. As far as I
can see it's only for use within your code, so it probably would
be better to define it as static and not to include it in the
libraries header file.

7) Defining macros with names as 'PI' and 'E' can be problematic, at
least as you expect this library to be used widly. Beside not
forcing those names into the namespace of the user I also would
recommend to use a few more digits, on many systems a double has
at least 14 to 16 significant digits and there may be many where
there a lot more. Finding more precise values for these constants
shoudn't be hard.

8) Again, if you expect your library to get a wider user base you
probably should avoid using too simple names for your functions.
Expect your users to come up with simple function or variable
names like "mean", "mode", "min_max", "agm" etc. and help them
avoid clashes with the names from your library by e.g. prepen-
ding the names of all your functions with e.g. "rwsp_' (short
for "Richard Weeks Statistics Package") so that they don't
have to worry if a simple function or variable name is already
use by your library.

Take all these points with a grain of salt. With the exception of
the problem with the DBL_ISEQUAL macro it's probably mostly not
much more than nit-picking. And I didn't go through all functions
very carefully or did read up on those where I don't have any idea
what they are used for (I am not an expert on statistics!). All in
all it was a pleasure to see an example rather well-written code -
I would love to see examples of that more often;-)

Regards, Jens
--
\ Jens Thoms Toerring ___ jt@xxxxxxxxxxx
\__________________________ http://toerring.de
.



Relevant Pages

  • Re: help with statistics library
    ... Your always using an int for the sizes of the arrays you pass ... Some sources simply say "avoid comparing doubles for equality." ... E.g. if you divide a double by an int you don't have to cast the ... libraries header file. ...
    (comp.programming)
  • [PATCH v2]kernel: remove mess codes
    ... void __init ... static int __init ... have to play switch_stack games or in some way use the pt_regs struct. ... How to Apply These Terms to Your New Libraries ...
    (Linux-Kernel)
  • Re: Newbie question about malloc
    ... "Implicit int" no longer exists in C. ... It's generally recommended to NOT cast the return from malloc. ... %d can never be the correct format specifier for a size_t. ... C99 introduced %zu for this purpose, ...
    (comp.lang.c)
  • Re: what will happen after i use free()???
    ... Inserting a cast before malloc is not needed, ... 56 bits while int is only 48 bits. ... So casting that value requires some ... Use cast only if you are absolutely sure that yor really needs the ...
    (comp.lang.c)
  • Re: about the array
    ... 3- If you cast to the wrong type by accident, ... are malloc and free. ... the compiler should issue a diagnostic - gcc ... unknown set of arguments, and return an int. ...
    (comp.lang.c)