OT: why are LAMP sites slow?

Steve Holden steve at holdenweb.com
Fri Feb 4 08:10:03 EST 2005


Paul Rubin wrote:

> Skip Montanaro <skip at pobox.com> writes:
> 
>>It's more than a bit unfair to compare Wikipedia with Ebay or
>>Google.  Even though Wikipedia may be running on high-performance
>>hardware, it's unlikely that they have anything like the underlying
>>network structure (replication, connection speed, etc), total number
>>of cpus or monetary resources to throw at the problem that both Ebay
>>and Google have.  I suspect money trumps LAMP every time.
> 
> 
> I certainly agree about the money and hardware resource comparison,
> which is why I thought the comparison with 1960's mainframes was
> possibly more interesting.  You could not get anywhere near the
> performance of today's servers back then, no matter how much money you
> spent.  Re connectivity, I wonder what kind of network speed is
> available to sites like Ebay that's not available to Jane Webmaster
> with a colo rack at some random big ISP.  Also, you and Tim Danieliuk
> both mentioned caching in the network (e.g. Akamai).  I'd be
> interested to know exactly how that works and how much difference it
> makes.
> 
It works by distributing content across end-nodes distributed throughout 
the infrastructure. I don't think Akamai make any secret of their 
architecture, so Google (:-) can help you there.

Of course it makes a huge difference, otherwise Google wouldn't have 
registered their domain name as a CNAME for an Akamai node set.

[OB PyCon] Jeremy Hylton, a Google employee and formerly a PythonLabs 
employee, will be at PyCon. Why don;t you come along and ask *him*?

> But the problems I'm thinking of are really obviously with the server
> itself.  This is clear when you try to load a page and your browser
> immediately get the static text on the page, followed by a pause while
> the server waits for the dynamic stuff to come back from the database.
> Serving a Slashdotting-level load of pure static pages on a small box
> with Apache isn't too terrible ("Slashdotting" = the spike in hits
> that a web site gets when Slashdot's front page links to it).  Doing
> that with dynamic pages seems to be much harder.  Something is just
> bugging me about this.  SQL servers provide a lot of capability (ACID
> for complicated queries and transactions, etc). that most web sites
> don't really need.  They pay the price in performance anyway.
> 
Well there's nothing wrong with caching dynamic content when the 
underlying database isn't terribly volatile and there is no critical 
requirement for the absolute latest data. Last I heard Google weren't 
promising that their searches are up-to-the-microsecond in terms of 
information content.

In terms on a network like Google's talking about "the server" doesn't 
really make sense: as Sun Microsystems have been saying for about twenty 
years now, "the network is the computer". There isn't "a server", it's 
"a distributed service with multiple centers of functionality".
> 
>>We also know Google has thousands of CPUs (I heard 5,000 at one point and
>>that was a couple years ago).
> 
> 
> It's at least 100,000 and probably several times that ;-).  I've heard
> every that search query does billions of cpu operations and crunches
> through 100's of megabytes of data (search on "apple banana" and there
> are hundreds of millions of pages with each word, so two lists of that
> size must be intersected).  100,000 was the published number of
> servers several years ago, and there were reasons to believe that they
> were purposely understating the real number.

So what's all this about "the server", then? ;-)

regards
  Steve
-- 
Meet the Python developers and your c.l.py favorites March 23-25
Come to PyCon DC 2005                      http://www.pycon.org/
Steve Holden                           http://www.holdenweb.com/



More information about the Python-list mailing list