speeding up string.split()

Duncan Booth duncan at NOSPAMrcp.co.uk
Fri May 25 05:44:26 EDT 2001


Chris Green <cmg at uab.edu> wrote in 
news:m2n182cs9c.fsf at phosphorus.tucc.uab.edu:

> Is there any way to speed up the following code?  Speed doesn't matter
> terribly much to me but this seems to be a fair example of what I need
> to do.
You haven't given much to go on here. Any real speedups are likely to 
depend very much on what you want to do with the data after you have split 
it.

> #!/usr/bin/python
> from string import split
> 
> for i in range(300000):
>     array = split('xxx.xxx.xxx.xxx yyy.yyy.yyy.yyy 6' +  
>                   '1064 80  54 54 1 1 14:00:00.8094 14:00:00.8908 1 2')
> 
Speedups to the above code:
1. The variable array is not used after it is assigned, and the assignment 
is constant. factor the assignment out of the loop.
2. After 1, the loop is empty, remove the loop.
1 and 2 together provide a massive speed improvement with no loss of 
functionality to the code as given.

Alternatively:
3. Put the code inside a function.
4. Use the split method on the string instead of the split function
5. Use string concatenation instead of '+'
3, 4 and 5 together knock about 25% off the running time.

6. If whatever you intend to do with the data involves filtering it on the 
first field or two, then using "xxx...".split(' ', 1) is very much faster 
than splitting up all the fields. This can reduce the time by two thirds 
easily.

7. Use Perl, or C, or whatever else takes your fancy if speed is that 
critical.

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?



More information about the Python-list mailing list