Re: mathematical problem, adjusting readings
- From: Eric Sosman <esosman@xxxxxxxxxxxxxxxxxxx>
- Date: Fri, 30 Mar 2007 00:05:14 -0400
Roedy Green wrote:
I probably could solve this myself, but I thought I would share this
problem with the group because you might find it interesting or mildly
challenging.
The problem is this. I sample the hit counts on my website each day,
ideally at noon. I then graph them. I also graph a moving average.
The problem is I am often late or early in my daily sampling, or the
website might be down, so I can't do it at noon.
When I sample late it generates the visual impression of a very good
day followed by a very bad day, even though nothing unusual actually
happened.
So here is the problem. How do I adjust the raw figures to get the
estimated figures for what would have been the count as of noon?
1. cut 1. Use linear interpolation.
2. cut 2. Account for the average distribution -- i.e. some times of
day are busier that others. I could create this by hourly sampling
over a week to discover the shape of the distribution curve.
Let's say for example it was a bell shaped curve around noon. I could
provide you with a bell shaped curve function, that you would in some
mysterious way use to adjust the daily figure. Just how is this curve
normalised?
Draw it as a cumulative distribution: the fraction of hits
that occur before Zero Hour (should be 0%), before ZH+1, ...,
ZH+24 (should be 100%). When you get a reading at ZH+-x, just
pretend that it's at the F(ZH+-x) point, and fudge the figures.
(I think economists say "seasonally adjusted" instead of "fudged.")
For example, if you're late by an hour and the curve says that
a day's first hour accounts for 3% of its traffic, assume that
the observed count represents 103% of a day, with 100/103 of it
belonging to the first day and 3/103 to the second.
... but that's not what I'd suggest. At each reading, note
the total hits and total elapsed time since the Big Bang. This
gives you a plot of cumulative hits versus time, and you want to
estimate the derivative of this curve. Most published techniques
for numerical differentiation assume a regular sampling interval,
but that won't do because your fundamental problem is an irregular
sampling schedule.
So: Take a span of N readings clustered around the moment of
interest, find the polynomial of degree N-1 that interpolates
at those points, and use the derivative of the polynomial to
estimate the derivative of the curve near the middle of the N
points. (This is the technique I use for the similar problem
of computing my car's gas mileage: I start with irregularly-
spaced readings of total miles driven and total gasoline burned,
then I use N=3, interpolate with a quadratic, and take the
quadratic's derivative to estimate my rate of consumption as
of one fill-up ago.)
Some caveats are in order. First, numerical differentiation
is basically unstable: You cannot expect high accuracy. Second,
to the extent that it's trustworthy at all, it tends to be more
trustworthy in the middle of an interval than at the ends. Third,
the derivative often gets *worse* as N increases: high-order
polynomials tend to "wiggle" up and down, so their derivatives
oscillate strongly. Fourth, the method is not very good if the
real underlying function has a lot of spikes and jumps and changes
of behavior, or if the samples are not just irregularly spaced
but "chaotically" spaced. So: Use a smallish, odd N to get an
estimate near the middle of each span, and don't expect to cope
with readings that are enormously far off the scheduled times.
Further suggestion: Can't you get cron to capture the hit counts
for you, with an accuracy of plus or minus a few seconds? Machines
are better than people at punctuality.
--
Eric Sosman
esosman@xxxxxxxxxxxxxxxxxxx
.
- References:
- mathematical problem, adjusting readings
- From: Roedy Green
- mathematical problem, adjusting readings
- Prev by Date: Re: mathematical problem, adjusting readings
- Next by Date: Re: JSP problem..
- Previous by thread: Re: mathematical problem, adjusting readings
- Index(es):
Relevant Pages
|
|