Tokenize

Andrew Dalke adalke at mindspring.com
Thu Jul 24 15:51:43 EDT 2003


Ken Fetting wants a 'StringTokenizer'.

Alan Kennedy points out
> >>> s = "This is a string to be tokenised"
> >>> s.split()
> ['This', 'is', 'a', 'string', 'to', 'be', 'tokenised']
   ...
> Or maybe you have something more specific in mind?

Another option is the little-known 'shlex' module, part of the standard
library.

>>> import shlex, StringIO
>>> infile = StringIO.StringIO("""ls -lart "have space.*" will travel""")
>>> x = shlex.shlex(infile)
>>> x.get_token()
'ls'
>>> x.get_token()
'-'
>>> x.get_token()
'lart'
>>> x.get_token()
'"have space.*"'
>>> x.get_token()
'will'
>>> x.get_token()
'travel'
>>> x.get_token()
''
>>>

As you can see, it treats '-' unexpectedly (compared to the shell).
Also, with __iter__ in newer Pythons, if these module were useful
then it would be nice if "for token in shlex..." worked.

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list