Parallel insert to postgresql with thread

Scott David Daniels Scott.Daniels at Acm.Org
Thu Oct 25 22:46:59 EDT 2007


Erik Jones wrote:
> 
> On Oct 25, 2007, at 7:28 AM, Scott David Daniels wrote:
>> Diez B. Roggisch wrote:
>>> Abandoned wrote:
>>>> Hi..
>>>> I use the threading module for the fast operation. But ....
>> [in each thread]
>>>> def save(a,b,c):
>>>>             cursor.execute("INSERT INTO ...
>>>>             conn.commit()
>>>>             cursor.execute(...)
>>>> How can i insert data to postgresql the same moment ?...
>>>
>>> DB modules aren't necessarily thread-safe. Most of the times, a 
>>> connection (and ... cursor) can't be shared between threads.
>>> So open a connection for each thread.
>>
>> Note that your DB server will have to "serialize" your inserts, so
>> ... a single thread through a single connection to the DB is the way 
>> to go.  Of course it (the DB server) may be clever enough to behave 
>> "as if" they are serialized, but most of your work parallelizing at 
>> your end simply creates new work at the DB server end.
> 
> Fortunately, in his case, that's not necessarily true....  If he 
> goes the recommended route with a separate connection for each thread, 
> then Postgres will not serialize multiple inserts coming from separate 
> connections unless there is something like and ALTER TABLE or REINDEX 
> concurrently happening on the table.  
 > The whole serialized inserts thing is strictly something popularized
 > by MySQL and is by no means necessary or standard (as with a lot of
> MySQL).

But he commits after every insert, which _does_ force serialization (if
only to provide safe transaction boundaries).  I understand you can get
clever at how to do it, _but_ preserving ACID properties is exactly what
I mean by "serialize," and while I like to bash MySQL as well as the
next person, I most certainly am not under the evil sway of the vile
MySQL cabal.

The server will have to be able to abort each transaction
_independently_ of the others, and so must serialize any index
updates that share a page by, for example, landing in the same node
of a B-Tree.

-Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list