[Tutor] String Tokenizer - As in Java

Kalle Svensson kalle at lysator.liu.se
Tue Aug 19 18:35:19 EDT 2003


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[Marc Barry]
> Although, my interpretation is that 'split' only only allows the
> defintion of one separator.  This is okay for most things, but I
> have strings that I would like to split with two separators such as
> '.' and '@'.  I don't think that I can use split to handle this and
> therefore will have to resort to something more powerful
> (i.e. regular expressions).

I once needed a split function for more than one separator, so I wrote
one.  Then, for reasons I can't remember, I wrote another one and some
code to test which was the fastest.

  def multisplit1(s, seps):
      res = s.split(seps[0])
      for sep in seps[1:]:
          tmp = []
          for r in res:
              tmp += r.split(sep)
          res = tmp
      return res
  
  def multisplit2(s, seps):
      res = [s]
      for i in seps:
          res2 = res
          res = []
          for j in res2:
              res += j.split(i)
      return res
  
  import time
  apan = []
  for m in multisplit1, multisplit2:
      s = "a;b:c;defg:xxxxx:" * 100000
      seps = [":", ";", "l"]
      x = time.time()
      apan.append(m(s, seps))
      print time.time() - x
  assert apan[0] == apan[1]

On my system, the first function runs this test a little bit faster.
I assume that could change with the test case or whatever.  I think a
regexp might be faster since that's C code (an exercise for the
interested reader: write multisplit3 that uses the re module), but
this might be clearer.

Peace,
  Kalle
- -- 
Kalle Svensson, http://www.juckapan.org/~kalle/
Student, root and saint in the Church of Emacs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.6 <http://mailcrypt.sourceforge.net/>

iD8DBQE/QkOwdNeA1787sd0RAgY7AKDOzJzBqT5M75flkutrYviAYSkgcgCgqyIl
uEIdMHXr0jyvT3vugNUuVfc=
=M+CY
-----END PGP SIGNATURE-----



More information about the Tutor mailing list