speeding up string.split()
Duncan Booth
duncan at NOSPAMrcp.co.uk
Fri May 25 05:44:26 EDT 2001
Chris Green <cmg at uab.edu> wrote in
news:m2n182cs9c.fsf at phosphorus.tucc.uab.edu:
> Is there any way to speed up the following code? Speed doesn't matter
> terribly much to me but this seems to be a fair example of what I need
> to do.
You haven't given much to go on here. Any real speedups are likely to
depend very much on what you want to do with the data after you have split
it.
> #!/usr/bin/python
> from string import split
>
> for i in range(300000):
> array = split('xxx.xxx.xxx.xxx yyy.yyy.yyy.yyy 6' +
> '1064 80 54 54 1 1 14:00:00.8094 14:00:00.8908 1 2')
>
Speedups to the above code:
1. The variable array is not used after it is assigned, and the assignment
is constant. factor the assignment out of the loop.
2. After 1, the loop is empty, remove the loop.
1 and 2 together provide a massive speed improvement with no loss of
functionality to the code as given.
Alternatively:
3. Put the code inside a function.
4. Use the split method on the string instead of the split function
5. Use string concatenation instead of '+'
3, 4 and 5 together knock about 25% off the running time.
6. If whatever you intend to do with the data involves filtering it on the
first field or two, then using "xxx...".split(' ', 1) is very much faster
than splitting up all the fields. This can reduce the time by two thirds
easily.
7. Use Perl, or C, or whatever else takes your fancy if speed is that
critical.
--
Duncan Booth duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
More information about the Python-list
mailing list