lexing nested parenthesis (for a Python Unix Shell)
Bengt Richter
bokr at oz.net
Fri Aug 2 16:15:46 EDT 2002
On Wed, 31 Jul 2002 17:30:40 -0400, Dave Cinege <dcinege at psychosis.com> wrote:
>On Wednesday 31 July 2002 16:32, Bengt Richter wrote:
>
>> if 1 and (var1 or qm('-d /etc/')):
>>
>> would already be legal Python.
>
>That's not the point. I'm not making legal Python but a 'short hand'
>subset, specifically a Python Unix Shell (aka bourne shell replacement)
>
>The idea right now is to parse and replace the short hand
>with python constructs that are predefined, and then let python
>run all of it. It's either that or I basically handle ALL the
>parsing and pretty much recreate the wheel.
>
>You see I only want to deal with 'my' subset...python will then run
>(via compile()) and handle the remaining grammer, indentation, etc.
>
>Things get a bit more difficult in interactive mode, but I feel this
>is still the best route.
>
>> Maybe a few examples with un-nested and nested parens (and nested ?(...)
>> constructs??) together with what Python you would like to have them
>> transformed to would get you some useful help.
>
>I haven't speced it all out yet, but I'm pretty much decided I want to
>contain most all thing within ()'s and prefix the first ( with an identifier.
>
>To put things in perspective:
> In bash sh: [ -d /etc/ ]
> In pysh: =(-d /etc/) (Maybe =('-d /etc/') )
>
> At runtime it will be parsed and replaced by:
> pysh_test('-d', '/etc/')
>
>Somethings will not be so easy as this, as they will not be a
>simple function name replacement. It can get ugly when I need
>to work recursivly through nested functions. I need to work
>on the next item first so I know how to handle output.
>
>IE if the return is normally a list, and it's nested in
>what requires and string, I have to account for that.
>
>In bash:
> for line in $(cat *.py); do echo $line; done # Yep time to retire this POS
>
>In pysh
> for $line in !(cat *(*.py)): print $line ;; # Ain't it pretty?
>
> FYI
> $ == variable prefix (I might be able to avoid using this, dunno)
> !() == Command Substitution
> *() == Shell glob (might become seemless, ie I search for glob chars!)
> ;; == explict newline
>Parsed to python:
> for line in pysh_cmdsub_inpath('cat',pysh_ListToArgStr(pysh_glob('*.py'))):
> print line # You can visualize the pysh functions...
>
>> 'Better' is a waste of time unless we're working on the real problem ;-)
>
>The problem is Python already works. : > I KNOW how I can do all this, I just
>don't feel like writing a complete parser if I can reuse something Python
>itself already uses for parsing.
>
You might consider breaking the source of your 'shorthand' into tokens of interest
using re. Jonathan Hogg has already provided a leg up. Your special shorthand
expressions seem to be parentheses with a prefix character [!*], or ';;', or $name,
and presumably unprefixed parens should work as usual. I suspect you can just let
$name be name in the first place, unless you need to do something special with specially
designated names, but we'll keep it in. I don't know what ';;' is supposed to do, but
if you do this:
>>> import re
>>> sh = "for $line in !(cat *(*.py)): print $line ;; # Ain't it pretty?"
>>> splitre = re.compile(r'([$!*][(]|[()]|[$][a-zA-Z_]\w+|;;)')
>>> pieces = splitre.split(sh)
>>> pieces
['for ', '$line', ' in ', '!(', 'cat ', '*(', '*.py', ')', '', ')', ': print ', '$line',
' ', ';;', " # Ain't it pretty?"]
maybe the pieces list will give you ideas on how to convert it, e.g.,
(ignoring indentation problems and other things we don't know about yet ;-)
>>> def munge(pieces, i=0):
... ret = []
... ihi = len(pieces)
... while i < ihi:
... p = pieces[i]
... ps = p.strip()
... if not p:
... pass
... elif not ps or ps[0] not in '$!*();':
... ret.append(p)
... elif ps[0] == '$':
... ret.append(p[1:]) # just strip dollar sign for now
... elif ps == '!(':
... # always recurse on any left paren, and return on right paren
... ret.append('pysh_cmdsub_inpath(')
... ret.append(`pieces[i+1].strip()`)
... ret.append(', ') # comma after arg
... s, i = munge(pieces, i+2) # returned i should be index of last item used
... ret.append(s)
... elif ps == '*(':
... ret.append('pysh_ListToArgStr(pysh_glob(')
... ret.append(`pieces[i+1].strip()`)
... s, i = munge(pieces, i+2) # returned i should be index of last item used
... ret.append(s+')') # need extra right paren to close ListToArgStr, whatever that is ;-)
... elif ps == ';;':
... ret.append('\n') # ??
... elif ps == ')':
... ret.append(')')
... return ''.join(ret), i
... elif ps == '(':
... ret.append('(')
... s, i = munge(pieces, i+1)
... ret.append(s)
... else:
... ret.append(p)
... i += 1
... return ''.join(ret), i-1
...
>>> munge(pieces)
("for line in pysh_cmdsub_inpath('cat', pysh_ListToArgStr(pysh_glob('*.py'))): print line
\n # Ain't it pretty?", 14)
>>> print munge(pieces)[0]
for line in pysh_cmdsub_inpath('cat', pysh_ListToArgStr(pysh_glob('*.py'))): print line
# Ain't it pretty?
(Not tested beyond what you see, and surely not exactly what you need, but it might
give you some ideas. BTW, I think pysh_ListToArgStr probably doesn't belong there. I'd
either include its functionality in pysh_glob or let pysh_cmd_sub_inpath handle it
internally depending on what it got as a second arg, depending on how things factor
in your overall vision. And ';;' almost certainly doesn't do what you had in mind ;-)
Regards,
Bengt Richter
More information about the Python-list
mailing list