Re: mathematical problem, adjusting readings



Roedy Green wrote:
I probably could solve this myself, but I thought I would share this
problem with the group because you might find it interesting or mildly
challenging.

The problem is this. I sample the hit counts on my website each day,
ideally at noon. I then graph them. I also graph a moving average.
The problem is I am often late or early in my daily sampling, or the
website might be down, so I can't do it at noon.

When I sample late it generates the visual impression of a very good
day followed by a very bad day, even though nothing unusual actually
happened.

So here is the problem. How do I adjust the raw figures to get the
estimated figures for what would have been the count as of noon?

1. cut 1. Use linear interpolation.

2. cut 2. Account for the average distribution -- i.e. some times of
day are busier that others. I could create this by hourly sampling
over a week to discover the shape of the distribution curve.

Let's say for example it was a bell shaped curve around noon. I could
provide you with a bell shaped curve function, that you would in some
mysterious way use to adjust the daily figure. Just how is this curve
normalised?

Draw it as a cumulative distribution: the fraction of hits
that occur before Zero Hour (should be 0%), before ZH+1, ...,
ZH+24 (should be 100%). When you get a reading at ZH+-x, just
pretend that it's at the F(ZH+-x) point, and fudge the figures.
(I think economists say "seasonally adjusted" instead of "fudged.")
For example, if you're late by an hour and the curve says that
a day's first hour accounts for 3% of its traffic, assume that
the observed count represents 103% of a day, with 100/103 of it
belonging to the first day and 3/103 to the second.

... but that's not what I'd suggest. At each reading, note
the total hits and total elapsed time since the Big Bang. This
gives you a plot of cumulative hits versus time, and you want to
estimate the derivative of this curve. Most published techniques
for numerical differentiation assume a regular sampling interval,
but that won't do because your fundamental problem is an irregular
sampling schedule.

So: Take a span of N readings clustered around the moment of
interest, find the polynomial of degree N-1 that interpolates
at those points, and use the derivative of the polynomial to
estimate the derivative of the curve near the middle of the N
points. (This is the technique I use for the similar problem
of computing my car's gas mileage: I start with irregularly-
spaced readings of total miles driven and total gasoline burned,
then I use N=3, interpolate with a quadratic, and take the
quadratic's derivative to estimate my rate of consumption as
of one fill-up ago.)

Some caveats are in order. First, numerical differentiation
is basically unstable: You cannot expect high accuracy. Second,
to the extent that it's trustworthy at all, it tends to be more
trustworthy in the middle of an interval than at the ends. Third,
the derivative often gets *worse* as N increases: high-order
polynomials tend to "wiggle" up and down, so their derivatives
oscillate strongly. Fourth, the method is not very good if the
real underlying function has a lot of spikes and jumps and changes
of behavior, or if the samples are not just irregularly spaced
but "chaotically" spaced. So: Use a smallish, odd N to get an
estimate near the middle of each span, and don't expect to cope
with readings that are enormously far off the scheduled times.

Further suggestion: Can't you get cron to capture the hit counts
for you, with an accuracy of plus or minus a few seconds? Machines
are better than people at punctuality.

--
Eric Sosman
esosman@xxxxxxxxxxxxxxxxxxx
.



Relevant Pages

  • Re: what "REALLY" is derivative?
    ... then it's easy to solve for the slope of the line (because ... excuse to divide f- fby x - a. ... but have you tried finding derivatives of functions ... curve on a piece of paper and call your curve f. ...
    (sci.math)
  • Re: Arroyos decline and a possible fix?
    ... >> out (and get hit a long way). ... >> He has a great slider and a good curve, and with a dependable FB he's ... >apparently something in his delivery changes when the breaking ball is ... because he or Tek or management think the hitters are sitting on the ...
    (alt.sports.baseball.bos-redsox)
  • Re: Arroyos decline and a possible fix?
    ... >>> hit the guy) or tail out. ... He has a great slider and a good curve, ... > the hitters are sitting on the breaking ball. ...
    (alt.sports.baseball.bos-redsox)
  • Closest call (driving) youve ever had
    ... was also a classic "dead man's curve" on the route that I'd ... I hit the curve (it's a deceptive thing as there's a little downhill ... to a T intersection ...
    (rec.autos.driving)
  • Re: bezier/B-splines with endpoint conditions
    ... curve c2 ends in point p2. ... At these endpoints I have information about ... derivatives at the endpoints). ... curves you have to use the control points. ...
    (comp.graphics.algorithms)