Tokenize
Andrew Dalke
adalke at mindspring.com
Thu Jul 24 15:51:43 EDT 2003
Ken Fetting wants a 'StringTokenizer'.
Alan Kennedy points out
> >>> s = "This is a string to be tokenised"
> >>> s.split()
> ['This', 'is', 'a', 'string', 'to', 'be', 'tokenised']
...
> Or maybe you have something more specific in mind?
Another option is the little-known 'shlex' module, part of the standard
library.
>>> import shlex, StringIO
>>> infile = StringIO.StringIO("""ls -lart "have space.*" will travel""")
>>> x = shlex.shlex(infile)
>>> x.get_token()
'ls'
>>> x.get_token()
'-'
>>> x.get_token()
'lart'
>>> x.get_token()
'"have space.*"'
>>> x.get_token()
'will'
>>> x.get_token()
'travel'
>>> x.get_token()
''
>>>
As you can see, it treats '-' unexpectedly (compared to the shell).
Also, with __iter__ in newer Pythons, if these module were useful
then it would be nice if "for token in shlex..." worked.
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list