OT: why do web BBS's and blogs get so slow?

Paul Rubin http
Sat Jan 31 23:37:53 EST 2004


"A.M. Kuchling" <amk at amk.ca> writes:
> * Some of that slowness may be latency on the client side, not the server. A
>   heavily table-based layout may require that the client get most or all of
>   the HTML before rendering it.  Slashdot's HTML is a nightmare of tables;
>   some weblogs have CSS-based designs that are much lighter-weight.

Yeah, I'm specifically concerned about how severs get overloaded.

> * To build the top page, Slashdot requires a lot of SQL queries.
>   There's the list of stories itself, but there are also lists of
>   subsections (Apache, Apple, ...), lists of stories in some
>   subsections (YRO, Book reviews, older stories), icons for the
>   recent stories, etc. All of these may need an SQL query, or at
>   least a lookup in some kind of cache.

That's what I'm saying--in a decently designed system, a common
operation like a front page load should take at most one SQL query, to
get the user preferences.  The rest should be in memory.  The user's
preferences can also be cached in memory so they'd be available with
no SQL queries if the user has connected recently.  On Slashdot, an
LRU cache of preferences for 10,000 or so users would probably
eliminate at least half those lookups, since those are the ones who
keep hitting reload.

>   It also displays counts of posts to each story (206 of 319 comments),
>   but I don't think it's doing queries for these numbers; instead there 
>   are extra columns in various SQL tables that cache this information 
>   and get updated somewhere else.  

That stuff would just be in memory.

> * I suspect the complicated moderation features chew up a lot of time.  You
>   take +1 or -1 votes from people, and then have to look up information
>   about the person, and then look at how people assess this person's 
>   moderation...  It's not doing this on every hit, though, but this feature
>   probably has *some* cost.  

Nah, that's insignificant.  Of 10k messages a day, maybe 1/3 get
moderated at all, but some get moderated more than once, so maybe
there's 5k moderations a day.  Each is just an update to the metadata
for the article, cost close to zero.

> * There are lots of anti-abuse features, because Slashdot takes a lot 
>   of punishment from bozos.  Perhaps the daily traffic is 10,000
>   that get displayed plus another 10,000 messages that need to be filtered
>   out but consume database space nonetheless.

I think 10,000 total is on the high side, just adding up the number of
comments on a typical day.

> * Slashcode actually implements a pretty generic web application system that
>   runs various templates and stitches together the output.

Yeah, I think it's doing way too much abstraction, too much SQL, etc.

> > 3) The message store would be two files, one for metadata and one for
> > message text.  Both of these would be mmap'd into memory.  There would
> > be a fixed length of metadata for each message, so getting the
> > metadata for message #N would be a single array lookup.  The metadata
> 
> I like this approach, though I think you'd need more files of metadata, e.g.
> the discussion of story #X starts with message #Y.  

Basically, a story would just be a special type of message, indicated
by some field in its metadata.  Another field for each message would
say where the next message in that story was (or a special marker if
it's the last message).  So the messages would be in a linked list.
You'd remember in memory where the last message of each story is, so
you could append easily.  You'd also make an SQL record for each story
so you can find the stories again on server restart.

> (Note that this is basically how Metakit works: it mmaps a region of
> memory and copies data around, provided a table-like API and letting
> you add and remove columns easily.  It might be easier to use
> Metakit than to reinvent a similar system from scratch.  Anyone know
> if this is also how SQLite works?)

Wow cool, I should look at Metakit.

> Maybe threading would be a problem with fixed-length metadata records.  It
> would be fixed-length if you store a pointer in each message to its parent,
> but to display a message thread you really want to chase pointers in the
> opposite directory, from message to children. But a message can have an
> arbitrary number of children, so you can't store such pointers and have
> fixed-length records any more.

Each metadata record would have a pointer to its parent, and another
pointer to the chronologically next record in that story.  So you'd
read in a story by scanning down the linear chronological list, using
the parent pointers to build tree structure in memory as you go.  If
you cache a few dozen of these trees, you shouldn't have to do those
scans very often (you'd do one if a user visits a very old story whose
tree is not in cache).  



More information about the Python-list mailing list