[Tutor] String Tokenizer - As in Java
Kalle Svensson
kalle at lysator.liu.se
Tue Aug 19 18:35:19 EDT 2003
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[Marc Barry]
> Although, my interpretation is that 'split' only only allows the
> defintion of one separator. This is okay for most things, but I
> have strings that I would like to split with two separators such as
> '.' and '@'. I don't think that I can use split to handle this and
> therefore will have to resort to something more powerful
> (i.e. regular expressions).
I once needed a split function for more than one separator, so I wrote
one. Then, for reasons I can't remember, I wrote another one and some
code to test which was the fastest.
def multisplit1(s, seps):
res = s.split(seps[0])
for sep in seps[1:]:
tmp = []
for r in res:
tmp += r.split(sep)
res = tmp
return res
def multisplit2(s, seps):
res = [s]
for i in seps:
res2 = res
res = []
for j in res2:
res += j.split(i)
return res
import time
apan = []
for m in multisplit1, multisplit2:
s = "a;b:c;defg:xxxxx:" * 100000
seps = [":", ";", "l"]
x = time.time()
apan.append(m(s, seps))
print time.time() - x
assert apan[0] == apan[1]
On my system, the first function runs this test a little bit faster.
I assume that could change with the test case or whatever. I think a
regexp might be faster since that's C code (an exercise for the
interested reader: write multisplit3 that uses the re module), but
this might be clearer.
Peace,
Kalle
- --
Kalle Svensson, http://www.juckapan.org/~kalle/
Student, root and saint in the Church of Emacs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.6 <http://mailcrypt.sourceforge.net/>
iD8DBQE/QkOwdNeA1787sd0RAgY7AKDOzJzBqT5M75flkutrYviAYSkgcgCgqyIl
uEIdMHXr0jyvT3vugNUuVfc=
=M+CY
-----END PGP SIGNATURE-----
More information about the Tutor
mailing list