How to find bad row with db api executemany()?

Chris Angelico rosuav at gmail.com
Fri Mar 29 23:57:56 EDT 2013


On Sat, Mar 30, 2013 at 2:36 PM, Roy Smith <roy at panix.com> wrote:
> In article <mailman.3981.1364613280.2939.python-list at python.org>,
>  Chris Angelico <rosuav at gmail.com> wrote:
>
>> Hmm. I heard around the forums that Amazon weren't that great at disk
>> bandwidth anyway, and that provisioning IO was often a waste of money.
>
> Au, contraire.  I guess it all depends on what you're doing.  If you're
> CPU bound, increasing your I/O bandwidth won't help.  But, at least on
> our database (MongoDB) servers, we saw a huge performance boost when we
> started going for provisioned IO.
>
> As far as I can tell, from a raw price/performance basis, they're pretty
> expensive.  But, from a convenience standpoint, it's hard to beat.

Yeah, I'm not saying you won't see performance go up - just that it's
going to be expensive compared to what you could do with a dedicated
server. The flexibility costs.

> Case in point: We've been thinking about SSD as our next performance
> step-up.  One day, we just spun up some big honking machine, configured
> it with 2 TB of SSD, and played around for a while.  Wicked fast.  Then
> we shut it down.  That experiment probably cost us $10 or so, and we
> were able to run it on the spur of the moment.

That is one thing the cloud is *awesome* for. "Hmm, I wonder......"
*half an hour later* "Now I know." *bill comes in* "That was cheap."

> Another example was last summer when we had a huge traffic spike because
> of a new product release.  Caught us by surprise how much new traffic it
> would generate.  Our site was in total meltdown.  We were able to spin
> up 10 new servers in an afternoon.  If we had to go out and buy
> hardware, have it shipped to us, figure out where we had rack space,
> power, network capacity, cooling, etc, we'd have been out of business
> before we got back on the air.

Yep. We're looking rather at server rental, from big honking data
centers (by the way, I never managed to figure this out - at what size
do things begin to honk? Larger than a bread box?), where hopefully
they would be able to deploy us more servers within a matter of hours.
Not as good as cloud (where you can have more servers in minutes), but
you described "in an afternoon" as your success story, so I'm guessing
most of the delay was an administrative one - the decision to actually
go ahead and spin those servers up. Unless you're automating the whole
thing (which is possible with the Amazon cloud, but not every client
will do it), that's always going to cost you.

> Yet another example.  We just (as in, while I've been typing this) had
> one of our servers go down.  Looks like the underlying hardware the VM
> was running on croaked, because when the instance came back up, it had a
> new IP address.  The whole event was over in a couple of minutes, with
> only minor disruption to the service.  And, presumably, there's some
> piece of hardware somewhere in Virginia that needs repairing, but that's
> not our problem.

Yeah, that's also a concern. And that works beautifully as long as
you're okay with that. It'll also happen more often with AWS than with
dedicated hardware, because there are more components to fail. That's
part of what wouldn't have fitted our server layout; we have a layer
on top of all that to manage monitoring (and, incidentally, we set up
a dozen home-grade laptops as a "server cluster" to test all those
systems, and knew within minutes of one of the laptops deciding that
it had crunched its last bits), and we really want stable IP addresses
for DNS and such. Cloud isn't bad, it was just a bad fit for us.

Side point: You mentioned SSDs. Are you aware of the fundamental risks
associated with them? Only a handful of SSD models are actually
trustworthy for databasing. Sure, they're fast, but can you afford
data corruption in the event of a power outage? Most SSDs will
cheerfully lie about fsync() and thus violate transactional integrity
(ACID compliance etc).

ChrisA



More information about the Python-list mailing list