modifying small chunks from long string

Tony Nelson *firstname*nlsnews at georgea*lastname*.com
Mon Nov 14 11:56:44 EST 2005


In article <1131951470.660501.68710 at g44g2000cwa.googlegroups.com>,
 "MackS" <mackstevenson at hotmail.com> wrote:

> Hello everyone
> 
> I am faced with the following problem. For the first time I've asked
> myself "might this actually be easier to code in C rather than in
> python?", and I am not looking at device drivers. : )
> 
> This program is meant to process relatively long strings (10-20 MB) by
> selectively modifying small chunks one at a time. Eg, it locates
> approx. 1000-2000 characters and modifies them. Currently I was doing
> this using a string object but it is getting very slow: although I only
> modify a tiny bit of the string at a time, a new entire string gets
> created whenever I "merge" it with the rest. Eg,
> 
> shortstr = longstr[beg:end]
> 
> # edit shortstr...
> 
> longstr = longstr[:beg] + shortstr + longstr[end:] # new huge string is
> created!!
> 
> Can I get over this performance problem without reimplementing the
> whole thing using a barebones list object? I though I was being "smart"
> by avoiding editing the long list, but then it struck me that I am
> creating a second object of the same size when I put the modified
> shorter string in place...

A couple of minutes experimenting with array.array at the python command 
line indicates that it will work fine for you.  Quite snappy on a 16 MB 
array, including a slice assignment of 1 KB near the beginning.  
Array.array is probably better than lists for speed, and uses less 
memory.  It is the way to go if you are going to be randomly editing all 
over the place but don't need to convert to string often.

MutableString warns that it is very slow.  It seems to work by having a 
string data item that it keeps replacing.  I didn't try it.


> shortstr = longstr[beg:end]
> 
> # edit shortstr...
> 
> longstr = longstr[:beg] + shortstr + longstr[end:] # new huge string is
> created!!

Replace this with slice assignment:

longarray = array.array('c',longstr) # once only at beginning!

shortstring = longarray[beg:end].tostring() # or just edit an array

# edit shortstring (or shortarray)

longarray[beg:end] = array.array('c',shortstr)

longstring = longarray.tostring() # if needed
________________________________________________________________________
TonyN.:'                        *firstname*nlsnews at georgea*lastname*.com
      '                                  <http://www.georgeanelson.com/>



More information about the Python-list mailing list