Read from database, write to another database, simultaneously

Sean Davis seandavi at gmail.com
Thu Jan 11 07:49:21 EST 2007



On Jan 11, 3:20 am, Laurent Pointal <laurent.poin... at limsi.fr> wrote:
> Bjoern Schliessmann a écrit :
>
> > Sean Davis wrote:
>
> >> The author of one of the python database clients mentioned that
> >> using one thread to retrieve the data from the oracle database and
> >> another to insert the data into postgresql with something like a
> >> pipe between the two threads might make sense, keeping both IO
> >> streams busy.
>
> > IMHO he's wrong. Network interaction is quite slow compared with CPU
> > performance, so there's no gain (maybe even overhead due to thread
> > management and locking stuff). That's true even on multiprocessor
> > machines, not only because there's almost nothing to compute but
> > only IO traffic. CMIIW.Not so sure, there is low CPU in the Python script, but there may be
> CPU+disk activity on the database sides [with cache management and other
> optimizations on disk access].
> So, with a reader thread and a writer thread, he can have a select on a
> database performed in parallel with an insert on the other database.
> After, he must know if the two databases use same disks, same
> controller, same host... or not.

Some more detail:

The machine running the script is distinct from the Oracle machine
which is distinct from the Postgresql machine.  So, CPU usage is low
and because of the independent machines for the database end, it is
definitely possible to read from one database while writing to the
other.  That is the solution that I am looking for, and Dennis's post
seems pretty close to what I need.  I will have to use some kind of
buffer.  A Queue isn't quite right as it stands, as the data is coming
in as records, but for postgres loading, a file-like stream is what I
need, so there will need to be either a wrapper around the Queue on the
get() side.  Or is there a better way to go about this detail?  What
seems to make sense to me is to stringify the incoming oracle data into
some kind of buffer and then read on the postgresql side.

Thanks,
Sean




More information about the Python-list mailing list