Re: linux persistent c application with http protocol?



Ben Bacarisse wrote:
On a reasonably modern laptop each loop takes 3ms when the Apache server
is on the same machine (i.e. using the loopback interface).  Running
several of these loops together, so the locking starts to kick in, this
goes up to 8ms.  At no time did the CPU usage of the server go above 4% --
all the work was in the client trying to force the requests through (at
about 10 million a day).

Lastly I moved the script to my cheap hosting account.  Over a 1Mb ADSL
connection, I timed 116 wget executions.  It took 10.01s -- almost exactly
86.4ms each!  This time, of course, most of the time was taken by the
network delays so any locking delays were probably insignificant.  Using
multiple sources for the requests would certainly bring this down.

Conclusion: Apache+mod_php can do the job without breaking a sweat.

That's probably a viable solution. I might add that your script, when running across the ADSL link to your hosting account, is going to be subject to network latencies, and not just for your data, but also for setting up and closing down the TCP connection, sending the HTTP request, etc. And it serializes all the requests. A whole bunch of distributed clients will not suffer the same problem, since they will be able to overlap their requests, which means that (up to a point) all that network latency problem disappears[1].

Another comment is that since requests will likely block waiting on
flushing the data to persistent storage, it might be worthwhile to
carefully choose what type of filesystem the counter file is stored on.
A journaling filesystem that can write small amounts of data (not
just metadata) into the log will be able to turn all these I/Os into
100% sequential access, at least until the log needs to be compacted.

Overally, I think you are approaching the problem from exactly the
right point of view:  testing the assumption of whether the easy tools
can handle the capacity, which it sounds like they can.  Some more
testing might be in order (a million requests a day doesn't mean that
you get them all evenly spaced every 86.4ms -- they could all come
during one hour), but it seems like Apache plus mod_php are in the
right ballpark.

I suppose I was just taking the original poster's requirements at
face value when I suggested an Apache module.  But I still think
it would be faster if the performance really is needed, and the
original poster did say "millions" of requests each day, which
being ambiguous and vague could mean 100 million as easily as
1 million.  :-)

  - Logan

[1]  It does mean you'll have more active processes on the server
     since each one will have to wait around longer to finish its
     job, but that can almost always be solved by adding more RAM.  :-)
.



Relevant Pages

  • ASP requests and locking
    ... I'm currently writing a web app that is designed to handle a very heavy traffic load - about 3 million requests per day, ... So if I'm locking to get at the value of a hashtable in the ASP cache, I'm locking all other requests for this particular app pool until the lock block exits. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: ASP requests and locking
    ... traffic load - about 3 million requests per day, ... When using the lock(x) C# idiom, ... locking to get at the value of a hashtable in the ASP cache, ... A final question about locking and the ASP cache. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Syslets, Threadlets, generic AIO support, v6
    ... however will introduce locking for things ... that were previously lockless), or CFQ needs to get better support for ... as the 32 requests will be submitted as a linked chain of atoms. ...
    (Linux-Kernel)