modifying small chunks from long string

Bengt Richter bokr at oz.net
Mon Nov 14 10:53:32 EST 2005


On 13 Nov 2005 22:57:50 -0800, "MackS" <mackstevenson at hotmail.com> wrote:

>Hello everyone
>
>I am faced with the following problem. For the first time I've asked
>myself "might this actually be easier to code in C rather than in
>python?", and I am not looking at device drivers. : )
>
>This program is meant to process relatively long strings (10-20 MB) by
>selectively modifying small chunks one at a time. Eg, it locates
>approx. 1000-2000 characters and modifies them. Currently I was doing
>this using a string object but it is getting very slow: although I only
>modify a tiny bit of the string at a time, a new entire string gets
>created whenever I "merge" it with the rest. Eg,
>
>shortstr = longstr[beg:end]
>
># edit shortstr...
>
>longstr = longstr[:beg] + shortstr + longstr[end:] # new huge string is
>created!!
>
The usual way is to accumulate the edited short pieces of the new version of longstr
in a list and then join them once, if you really need the new longstr in a single piece
for something. I.e., (untested sketch)

    chunks_of_new_longstr = []  
    for chunk in chunker(longstr):
        #edit chunk (your shortstr)
        newlong.append(chunk) # or do several appends of pieces from the editing of a chunk
    longstr = ''.join(chunks_of_new_longstr)

But if you don't really need it except to write it to output and the next thing would be
    open('longstr.txt','wb').write(longstr)  # might want 'w' instead of 'wb' for plain text data

then don't bother joining into a new longstr but do
    open('longstr.txt','wb').writelines(chunks_of_new_longstr)

instead. But if you are going to do that, why not just
    fout = open('longstr.txt','wb')
before the loop, and
        fout.write(chunk)
in place of
        newlong.append(chunk)

Of course, if longstr is coming from a file, maybe you can have
the chunker operate on a file instead of a longstr in memory.

>Can I get over this performance problem without reimplementing the
>whole thing using a barebones list object? I though I was being "smart"
>by avoiding editing the long list, but then it struck me that I am
>creating a second object of the same size when I put the modified
>shorter string in place...
>
I imagine you should be able to change a very few lines to switch between
ways of getting your input stream of editable chunks and accumulating your output.

OTOH, this is all guesswork without more context ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list