String concatenation - which is the fastest way ?

Chris Angelico rosuav at gmail.com
Thu Aug 11 06:59:31 EDT 2011


On Thu, Aug 11, 2011 at 7:40 AM,  <przemolicc at poczta.fm> wrote:
> I am not a database developer so I don't want to change the whole process
> of data flow between applications in my company. Another process is
> reading this XML from particular Oracle table so I have to put the final XML there.

I think you may be looking at a submission to
http://www.thedailywtf.com/ soon. You seem to be working in a rather
weird dataflow. :( Under the circumstances, you're probably going to
want to go with the original ''.join() option.

> This server has 256 GB of RAM so memory is not a problem.
> Also the select which fetches the data is sorted. That is why I have to
> carefully divide into subtasks and then merge it in correct order.

There's no guarantee that all of that 256GB is available to you, of course.

What may be the easiest way is to do the select in a single process,
then partition it and use the Python multiprocessing module to split
the job into several parts. Then you need only concatenate the handful
of strings.

You'll need to do some serious profiling, though, to ascertain where
the bottleneck really is. Is it actually slow doing the concatenation,
or is it taking more time reading/writing the disk? Is it actually all
just taking time due to RAM usage? Proper string concatenation doesn't
need a huge amount of CPU.

ChrisA



More information about the Python-list mailing list