python-noob - which container is appropriate for later exporting into mySql + matplotlib ?

Chris Angelico rosuav at gmail.com
Sat Apr 13 10:03:25 EDT 2013


On Sat, Apr 13, 2013 at 11:30 PM, someone <newsboost at gmail.com> wrote:
> On 04/13/2013 01:39 PM, Chris Angelico wrote:
>> Note that there's a caveat: You have to tell SQLite to be ACID
>> compliant, effectively.
>
>
> So, you're saying to me that by default SQLite isn't ACID compliant, if I
> begin to use it in my own small programs?
> ...
> Do I understand you correct, that by "You have to tell SQLite to be ACID
> compliant, effectively", you're saying that by default SQLite isn't ACID
> compliant ?
>

First off: I am NOT inherently familiar with sqlite. I'm more familiar
with PostgreSQL, DB2, and MySQL. I'm also not an expert at database
engine design, so this discussion is from the point of view of an
applications developer who has used databases from his apps.

True ACID compliance demands support at every level:

1) The application has to operate in logical units of work, which -
apart from with DB2 - requires an explicit "BEGIN" query, or
single-statement transactions.

2) The database engine must employ some form of write-ahead log.
Different databases do this somewhat differently (according to the
page I linked to, SQLite does this in reverse, maintaining a log
that's sufficient to *undo* the transaction, while PostgreSQL does
this forwards, maintaining a log that's sufficient to *redo* it as
well - more effort, but it can be used for database replication), but
one way or another, there must be a way to detect half-done
transactions.

3) The operating system and filesystem must support a forced file
synchronization (fsync/fdatasync), so the database engine can wait for
the data to be written to disk.

4) The underlying media (hard disk, SSD, USB stick, etc) must respond
to the fsync call by actually writing the content to persistent
storage before returning.

Failure at any level means the overall system is not ACID compliant.
PostgreSQL has a huge amount of code in it to try to deal with (or at
least recognize) a level-3 failure, but nothing in the database engine
can deal with level 1 or 4 issues.

You'd have to actually test it. The easiest way is to get two
computers, side by side, and run the database engine on one and a
monitor on the other. To test some SSDs at work, I knocked together a
little program that worked somewhat thus:

* Connect to the database over TCP/IP (easy, as we were doing this
with PostgreSQL)
* Create a table with a number of rows with an ID and a counter,
initialized to 0
* Repeatedly, in parallel, perform a transaction:
  - Increment the counter on one of the rows (at random)
  - Increment a "possible" in-memory counter for that row
  - Commit the database transaction
  - Increment a "confirmed" in-memory counter for that row
* When an error of "database seems to be down" is detected, wait for
it to come up again, then query the table. The counters must all be at
least their corresponding "possible" value and at most the
"confirmed".

With that running, I simply pulled the plug on the database computer.
With a properly-configured hard disk, every one of the counters was
within its correct range. With a lying SSD, though, they could be
anywhere from "pretty close" (with a low workload - simulated by
having only a single thread doing transactions and having it sleep for
a few ms each iteration) to "pretty appalling" (with a bunch of
threads spinning tightly, keeping the workload high). Once the SSD
starts doing major write reordering, its throughput soars, but at the
cost of trustworthiness.

> Next question: Is it something I should worry about in my own programs (I'm
> not sure, I'm an SQL noob)... ?

Yes, it most certainly is. If you have any data that you care about,
put together some kind of test that will allow you to literally pull
the plug on the database, while still knowing whether or not your
transaction was completed (so you'll most likely need some kind of
"possible" / "confirmed" counter pair as I used above).

ChrisA



More information about the Python-list mailing list