Re: OT: why do web BBS's and blogs get so slow?

From: A.M. Kuchling (amk_at_amk.ca)
Date: 02/01/04


Date: Sat, 31 Jan 2004 21:46:59 -0600

On 31 Jan 2004 14:56:15 -0800,
        Paul Rubin <> wrote:
> an ISP on a fast computer with plenty of net bandwidth. I'm wondering
> what those programs are doing, that makes them bog down so badly.
> Anyone know what the main bottlenecks are? I'm just imagining them
> doing a bunch of really dumb things.

Oh, interesting! I'm sporadically working on a Slashdot clone, so this sort
of thing is a concern. As a result I've poked around in the Slashdot SQL
schema and page design a bit.

Skipping ahead:
> Am I being naive and/or
> missing something important? Slashdot itself uses a tremendous amount
> of hardware by comparison.

Additional points I can think of:

* Some of that slowness may be latency on the client side, not the server. A
  heavily table-based layout may require that the client get most or all of
  the HTML before rendering it. Slashdot's HTML is a nightmare of tables;
  some weblogs have CSS-based designs that are much lighter-weight.

* To build the top page, Slashdot requires a lot of SQL queries. There's
  the list of stories itself, but there are also lists of subsections (Apache,
  Apple, ...), lists of stories in some subsections (YRO, Book reviews, older
  stories), icons for the recent stories, etc. All of these may need an SQL
  query, or at least a lookup in some kind of cache.
  
  It also displays counts of posts to each story (206 of 319 comments),
  but I don't think it's doing queries for these numbers; instead there
  are extra columns in various SQL tables that cache this information
  and get updated somewhere else.

* I suspect the complicated moderation features chew up a lot of time. You
  take +1 or -1 votes from people, and then have to look up information
  about the person, and then look at how people assess this person's
  moderation... It's not doing this on every hit, though, but this feature
  probably has *some* cost.

* There are lots of anti-abuse features, because Slashdot takes a lot
  of punishment from bozos. Perhaps the daily traffic is 10,000
  that get displayed plus another 10,000 messages that need to be filtered
  out but consume database space nonetheless.

* Slashcode actually implements a pretty generic web application system that
  runs various templates and stitches together the output. A Slashcode
  "theme" consists of the templates, DB queries, and cron jobs that make up
  a site; you could write a Slashcode theme that was amazon.com or any other
  web application, in theory. However, only one theme has ever been
  written, AFAICT: the one used to run Slashdot. (Some people have taken
  this theme and tweaked it in small stylistic ways, but that's a matter of
  editing this one theme, not creating a whole new one.) This adds an
  extra level of interpretation because the site is running these templates
  all the time.
  
> 3) The message store would be two files, one for metadata and one for
> message text. Both of these would be mmap'd into memory. There would
> be a fixed length of metadata for each message, so getting the
> metadata for message #N would be a single array lookup. The metadata

I like this approach, though I think you'd need more files of metadata, e.g.
the discussion of story #X starts with message #Y.

(Note that this is basically how Metakit works: it mmaps a region of memory
and copies data around, provided a table-like API and letting you add and
remove columns easily. It might be easier to use Metakit than to reinvent a
similar system from scratch. Anyone know if this is also how SQLite works?)

Maybe threading would be a problem with fixed-length metadata records. It
would be fixed-length if you store a pointer in each message to its parent,
but to display a message thread you really want to chase pointers in the
opposite directory, from message to children. But a message can have an
arbitrary number of children, so you can't store such pointers and have
fixed-length records any more.

In my project discussions haven't been implemented yet, so I have no
figures to present.

--amk



Relevant Pages

  • Re: Which one is faster: IMAP server or MySQL server?
    ... I see it has a comment about Zimbra storing metadata to SQL and message ... actually more about metadata than bodies. ... Implementing a well performing IMAP server is mostly about minimizing ...
    (comp.mail.imap)
  • Re: elegent way to handle "pluggable" backend servers?
    ... For most selects it isn't that hard to turn metadata ... about the query into vendor SQL. ... That can easily be shoved into a single factory ... The handler classes all implement a few hooks like ...
    (perl.dbi.users)
  • Re: Documents in SQL Server
    ... You would need to use a SqlBinary reader or something in a .NET ... any tools in SQL or Query Analyzer to do this). ... > update -- I see metadata in the Docs table, ... > "nancyk" wrote in message ...
    (microsoft.public.sharepoint.portalserver)
  • Re: Is SQL a good language for this problem ?
    ... having a variable number of key/value pairs, where the names of the keys ... The problem I have is that I can't find a way of doing this in SQL that's ... Basically you are asking to re-invent precisely what an SQL DBMS does for ... and dynamically construct your queries using its metadata. ...
    (comp.databases)
  • Re: Improved Debian Project Emergency Communications (was Re: communication structures crumbled)
    ... >> stories are as often misinformation as news. ... > I read slashdot for the headlines. ... The real news is probably linked ... That's my whole point =) The headlines are where the damage is done. ...
    (Debian-User)