Splitting a string every 'n'

Alex Martelli aleax at aleax.it
Tue Jul 9 10:09:39 EDT 2002


Simon.Foster at smiths-aerospace.com wrote:

> 
> What is the idiomatic way to split a string into a list
> containing 'n' character substrings?  I normally do

I'm not sure there is just one.  I suspect what _feels_
idiomatic to you in this respect depends on where you're
coming from -0- just saw Harvey Thomas post a re-based
solution that is surely quite correct (and perhaps may
even have good performance!) but would just never occur
to me first thing...

> something like:
> 
> while strng:
>     substring = strng[:n]
>     strng = strng[n:]
>     <process substring>
> 
> But the performance of this is hopeless for very long strings!

Definitely!

> Presumable because there's too much list reallocation?  Can't Python

Yep.

> just optimise this by shuffling the start of the list forward?

Not without a lot of trouble that would definitely complicate
the interpreter's code and quite possibly deteriorate performance
for all normal cases that can't easily benefit from such
"sharing" of pieces of one string.


> Any better ideas, short of manually indexing through?  Is there

What's wrong with "manually indexing through"?  I assume you mean:

for i in xrange(0, len(strng), n):
    substring = strng[i:i+n]
    process(substring)

and I don't see anything wrong with it -- though I might shrink
it a bit down to

for i in xrange(0, len(strng), n):
    process(strng[i:i+n])

that's basically the same idea.

I'm honestly having a hard time seeing anything wrong with this
solution, as presumably needed to come up with anything BETTER.
DIFFERENT is easy, e.g., on 2.3 or 2.2 + from __future__ import
generators, why not a generator:

def slicer(strng, n):
    for i in xrange(0, len(strng), n):
        yield strng[i:i+n]

and then

for substring in slicer(strng, n):
    process(substring)

but that's really the same code again with a false moustache...


Alex




More information about the Python-list mailing list