Splitting strings - by iterators?

John Machin sjmachin at lexicon.net
Fri Feb 25 18:17:58 EST 2005


Jeremy Sanders wrote:
> On Fri, 25 Feb 2005 17:14:24 +0100, Diez B. Roggisch wrote:
>
> > Maybe [c]StringIO can be of help. I don't know if it's iterator is
lazy. But
> > at least it has one, so you can try and see if it improves
performance :)
>
> Excellent! I somehow missed that module. StringIO speeds up the
iteration
> by a factor of 20!
>

Twenty?? StringIO.StringIO or cStringIO.StringIO???

I did some "timeit" tests using the code below, on 400,000 lines of 53
chars (uppercase + lowercase + '\n').

On my config (Python 2.4, Windows 2000, 1.4 GHz Athlon chip, not short
of memory), cStringIO took 0.18 seconds and the "hard way" took 0.91
seconds. Stringio (not shown) took 2.9 seconds. FWIW, moving an
attribute look-up in the (sfind = s.find) saves only about 0.1 seconds.

>python -m timeit -s "import itersplitlines as i; d =
i.mk_data(400000)" "i.test_csio(d)"
10 loops, best of 3: 1.82e+005 usec per loop

>python -m timeit -s "import itersplitlines as i; d =
i.mk_data(400000)" "i.test_gen(d)"
10 loops, best of 3: 9.06e+005 usec per loop

A few questions:
(1) What is your equivalent of the "hard way"? What [c]StringIO code
did you use?
(2) How did you measure the time?
(3) How long does it take *compile* your 400,000-line Python script?

!import cStringIO
!
!def itersplitlines(s):
!   if not s:
!      yield s
!      return
!   pos = 0
!   sfind = s.find
!   epos = len(s)
!   while pos < epos:
!      newpos = sfind('\n', pos)
!      if newpos == -1:
!         yield s[pos:]
!         return
!      yield s[pos:newpos+1]
!      pos = newpos+1
!
!def test_gen(s):
!   for z in itersplitlines(s):
!      pass
!
!def test_csio(s):
!   for z in cStringIO.StringIO(s):
!      pass
!
!def mk_data(n):
!   import string
!   return (string.lowercase + string.uppercase + '\n') * n




More information about the Python-list mailing list