tcl-style string parsing
Guido van Rossum
guido at cnri.reston.va.us
Tue Oct 19 17:13:59 EDT 1999
Dev <devnull at fleetingimage.com> writes:
> I have a string consisting of multiple strings---
>
> a = '"This, m\'dear," is "an example" "of a parsing problem."'
>
> I would like to efficiently convert this to a list (or tuple):
>
> b = ["This, m'dear,", "is", "an example", "of a parsing problem."]
>
> In the source string, note that whitespace
> within a quoted string should be retained,
> whitespace outside a quoted string should be ignored,
> and strings without whitespace don't need to be quoted.
>
> I'm converting some code from TCL, where this is trivial.
> (The original string can be treated and indexed as a list.)
> I've not found a suitable re expression for this, nor a set
> of string replacements.
Your problem seems to be designed with Tcl in mind. It is a parsing
problem. It so happens that Tcl stole some of its lexing ideas from
the shell and Python 1.5.2 happens to have a handy module, shlex by
Eric Raymond, that nearly solves your problem:
>>> import shlex
>>> import StringIO
>>> f = StringIO.StringIO(a)
>>> s = shlex.shlex(f)
>>> l = []
>>> while 1:
t = s.get_token()
if not t: break
l.append(t)
print `t`
'"This, m\'dear,"'
'is'
'"an example"'
'"of a parsing problem."'
>>> print l
['"This, m\'dear,"', 'is', '"an example"', '"of a parsing problem."']
>>>
Note that shlex leaves the string quotes around the quoted tokens;
these are easily removed by adding something like
if len(t) >= 2 and t[0] == '"' == t[-1]:
t = t[1:-1]
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-list
mailing list