OT: why do web BBS's and blogs get so slow?

A.M. Kuchling amk at amk.ca
Sat Jan 31 22:46:59 EST 2004


On 31 Jan 2004 14:56:15 -0800, 
	Paul Rubin <> wrote:
> an ISP on a fast computer with plenty of net bandwidth.  I'm wondering
> what those programs are doing, that makes them bog down so badly.
> Anyone know what the main bottlenecks are?  I'm just imagining them
> doing a bunch of really dumb things.

Oh, interesting!  I'm sporadically working on a Slashdot clone, so this sort
of thing is a concern.  As a result I've poked around in the Slashdot SQL
schema and page design a bit.

Skipping ahead:
> Am I being naive and/or
> missing something important?  Slashdot itself uses a tremendous amount
> of hardware by comparison.

Additional points I can think of:

* Some of that slowness may be latency on the client side, not the server. A
  heavily table-based layout may require that the client get most or all of
  the HTML before rendering it.  Slashdot's HTML is a nightmare of tables;
  some weblogs have CSS-based designs that are much lighter-weight.

* To build the top page, Slashdot requires a lot of SQL queries.  There's
  the list of stories itself, but there are also lists of subsections (Apache,
  Apple, ...), lists of stories in some subsections (YRO, Book reviews, older
  stories), icons for the recent stories, etc. All of these may need an SQL
  query, or at least a lookup in some kind of cache.
  
  It also displays counts of posts to each story (206 of 319 comments),
  but I don't think it's doing queries for these numbers; instead there 
  are extra columns in various SQL tables that cache this information 
  and get updated somewhere else.  

* I suspect the complicated moderation features chew up a lot of time.  You
  take +1 or -1 votes from people, and then have to look up information
  about the person, and then look at how people assess this person's 
  moderation...  It's not doing this on every hit, though, but this feature
  probably has *some* cost.  

* There are lots of anti-abuse features, because Slashdot takes a lot 
  of punishment from bozos.  Perhaps the daily traffic is 10,000
  that get displayed plus another 10,000 messages that need to be filtered
  out but consume database space nonetheless.

* Slashcode actually implements a pretty generic web application system that
  runs various templates and stitches together the output.  A Slashcode
  "theme" consists of the templates, DB queries, and cron jobs that make up
  a site; you could write a Slashcode theme that was amazon.com or any other
  web application, in theory.  However, only one theme has ever been
  written, AFAICT: the one used to run Slashdot. (Some people have taken
  this theme and tweaked it in small stylistic ways, but that's a matter of
  editing this one theme, not creating a whole new one.)  This adds an
  extra level of interpretation because the site is running these templates
  all the time.
  
> 3) The message store would be two files, one for metadata and one for
> message text.  Both of these would be mmap'd into memory.  There would
> be a fixed length of metadata for each message, so getting the
> metadata for message #N would be a single array lookup.  The metadata

I like this approach, though I think you'd need more files of metadata, e.g.
the discussion of story #X starts with message #Y.  

(Note that this is basically how Metakit works: it mmaps a region of memory
and copies data around, provided a table-like API and letting you add and
remove columns easily.  It might be easier to use Metakit than to reinvent a
similar system from scratch.  Anyone know if this is also how SQLite works?)

Maybe threading would be a problem with fixed-length metadata records.  It
would be fixed-length if you store a pointer in each message to its parent,
but to display a message thread you really want to chase pointers in the
opposite directory, from message to children. But a message can have an
arbitrary number of children, so you can't store such pointers and have
fixed-length records any more.

In my project discussions haven't been implemented yet, so I have no 
figures to present.

--amk



More information about the Python-list mailing list