Re: CL Scaling for High Traffic Web Sites



On Apr 29, 2:07 pm, "Alex Mizrahi" <udode...@xxxxxxxxxxxxxxxxxxxxx>
wrote:
(message (Hello 'bob)
(you :wrote :on '(29 Apr 2007 12:48:00 -0700))
(

b> I guess I could use sticky session on the load balancer. However,
b> there's still the problem with caching database queries in memory once
b> they grow too big for in-memory data structures and have to be stored
b> on disk constantly. Caching them in local hash tables would result in
b> a lot of duplications. This is a fairly common problem now a days.

lot of duplications where? in memory of different machines, that is actually
what caching is meant to do -- have data closer. if duplications on same
machine, that should be eliminated..

b> address space. DragonFly BSD has some features that makes it easy to
b> do this over a slow and loosely-connected ethernet (they aim to do
b> this efficiently over the internet in the future!).

i think HT over network will introduce at least order-of-magnitude overhead
comparing to direct hash tables.
but it might still acceptable if it introduces benefits comparing to RDBMS
performance.
i didn't evaluate memcached performance, but i saw that RDBMS performance
was killed with network communications -- sending each packet introduces
some overhead, and applications was sending megatons of those packets.. so
time was spent not on web server or database, but on OS system time..
certainly, network HT might have more optimal protocol than RDBMS over
network, or you can fine-tune requests better..

but you are going to check automated shared memory. i don't see how it can
work fast :)
if you'll do testing, please post here some benchmarking results.

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"I am everything you want and I am everything you need")

On Apr 29, 2:07 pm, "Alex Mizrahi" <udode...@xxxxxxxxxxxxxxxxxxxxx>
wrote:
(message (Hello 'bob)
(you :wrote :on '(29 Apr 2007 12:48:00 -0700))
(

b> I guess I could use sticky session on the load balancer. However,
b> there's still the problem with caching database queries in memory once
b> they grow too big for in-memory data structures and have to be stored
b> on disk constantly. Caching them in local hash tables would result in
b> a lot of duplications. This is a fairly common problem now a days.

lot of duplications where? in memory of different machines, that is actually
what caching is meant to do -- have data closer. if duplications on same
machine, that should be eliminated..

The idea of memcached is to do make use of existing memory for as
cheaply as possible, retrieving data from multiple machines through a
centralized interface, while offering orders of magnitudes of
performance over disks. By duplication I mean each machine stores
disk data locally in memory where you could have a lot of overlaps
between machines. In many memcached setups, each existing web server
sets aside a slice of the memory for memcached, to be shared among all
servers. news.yc is targeted toward a niche group, and Paul Graham
mentioned that he hope to never expose news.yc to reddit kind of
traffic.


b> address space. DragonFly BSD has some features that makes it easy to
b> do this over a slow and loosely-connected ethernet (they aim to do
b> this efficiently over the internet in the future!).

i think HT over network will introduce at least order-of-magnitude overhead
comparing to direct hash tables.
but it might still acceptable if it introduces benefits comparing to RDBMS
performance.
i didn't evaluate memcached performance, but i saw that RDBMS performance
was killed with network communications -- sending each packet introduces
some overhead, and applications was sending megatons of those packets.. so
time was spent not on web server or database, but on OS system time..
certainly, network HT might have more optimal protocol than RDBMS over
network, or you can fine-tune requests better..

DragonFly BSD locks each thread to its own CPU, so data won't be
flying all over the place unneccessarily. Only the data that need to
be shared are passed around (eg. a global hash table frequently
accessed by multiple machines). The main advantage of this over
memcached is that you don't have to serialize objects, and just leave
them as they are. Since you can't serialize things like closures
directly, this can save you some headaches during development.


but you are going to check automated shared memory. i don't see how it can
work fast :)
if you'll do testing, please post here some benchmarking results.


I don't have spare machines to play around at the moment. But if
anyone has done something similar before, any info would be
appreciated.

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"I am everything you want and I am everything you need")

Cheers!

Bob

.



Relevant Pages

  • Re: CL Scaling for High Traffic Web Sites
    ... b> there's still the problem with caching database queries in memory once ... i think HT over network will introduce at least order-of-magnitude overhead ...
    (comp.lang.lisp)
  • Re: Share not showing up on other machines
    ... You might try running network setup again. ... In memory of our dear friend, ... > We have four machines networked. ...
    (microsoft.public.windowsxp.basics)
  • Re: Maximum Swap Partition Size
    ... So they go straight from disk to the network. ... > Haven't you ever learned about file caching? ... in 4GB of main memory without any need for either swapping or dumping. ...
    (alt.os.linux)
  • [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
    ... Convergence of network and storage paths" ... "Net vm deadlock fix " ... Maybe it is another memory deadlock, ... dipping into the memalloc reserve if it must. ...
    (Linux-Kernel)
  • Re: Future proofing a system
    ... a quad running at 3.66GHz. ... the expensive machines have gaming video cards in them ... 2GB memory minimum. ... add upgrades to later). ...
    (alt.comp.hardware.pc-homebuilt)