OT: why are LAMP sites slow?

Jack Diederich jackdied at jackdied.com
Fri Feb 4 10:59:29 EST 2005


On Thu, Feb 03, 2005 at 10:09:49PM -0800, Paul Rubin wrote:
> aurora <aurora00 at gmail.com> writes:
> > I'm lost. So what do you compares against when you said LAMP is slow?
> > What  is the reference point? Is it just a general observation that
> > slashdot is  slower than we like it to be?
[reordered Paul's email a bit]

> > If you mean MySQL or SQL database in general is slow, there are truth
> > in  it. The best thing about SQL database is concurrent access,
> > transactional  semantics and versatile querying. Turns out a lot of
> > application can  really live without that. If you can rearchitect the
> > application using  flat files instead of database it can often be a
> > big bloom.
> 
> This is the kind of answer I had in mind.

*ding*ding*ding*  The biggest mistake I've made most frequently is using
a database in applications.  YAGNI.  Using a database at all has it's
own overhead.  Using a database badly is deadly.  Most sites would
benefit from ripping out the database and doing something simpler.
Refactoring a database on a live system is a giant pain in the ass,
simpler file-based approaches make incremental updates easier.

The Wikipedia example has been thrown around, I haven't looked at the
code either;  except for search why would they need a database to
look up an individual WikiWord?  Going to the database requires reading
an index when pickle.load(open('words/W/WikiWord')) would seem sufficient.

> Yes, that's the basic observation, not specifically Slashdot but for
> lots of LAMP sites (some PHPBB sites are other examples) have the same
> behavior.  You send a url and the server has to grind for quite a
> while coming up with the page, even though it's pretty obvious what
> kinds of dynamic stuff it needs to find.  Just taking a naive approach
> with no databases but just doing everything with in-memory structures
> (better not ever crash!) would make me expect a radically faster site.
> For a site like Slashdot, which gets maybe 10 MB of comments a day,
> keeping them all in RAM isn't excessive.  (You'd also dump them
> serially to a log file, no seeking or index overhead as this happened.
> On server restart you'd just read the log file back into ram).

You're preaching to the choir, I don't use any of the fancy stuff in
Twisted but the single threaded nature means I can keep everything in
RAM and just serialize changes to disk (to survive a restart).
This allows you to do very naive things and pay no penalty. My homespun
blogging software isn't as full featured as Pybloxsom but it is a few 
hundred times(!) faster.  Pybloxsom pays a high price in file stats
because it allows running under CGI.  Mine would too as a CGI but it
isn't so *shrug*.

> > A lot of these is just implementation. Find the right tool and the
> > right design for the job. I still don't see a case that LAMP based
> > solution is inherently slow.
> 
> I don't mean LAMP is inherently slow, I just mean that a lot of
> existing LAMP sites are observably slow.

A lot of these are just implementation.  Going the dumb non-DB way won't
prevent you from making bad choices but if a lot of bad choices are made
simply because of the DB (my assertion) dropping the DB would avoid
some bad choices.  I think Sourceforge has one table for all project's
bugs & patches.  That means a never used project's bugs take up space
in the index and slow down access to the popular projects.  Would a
naive file-based implementation have been just as bad?  maybe.

If there is interest I'll follow up with some details on my own LAMP
software which does live reports on gigs of data and - you guessed it -
I regret it is database backed.  That story also involves why I started
using Python (the prototype was in PHP).

-Jack



More information about the Python-list mailing list