lexing nested parenthesis (for a Python Unix Shell)
Jim Meier
jim at dsdd.org
Fri Aug 2 01:45:48 EDT 2002
On Wed, 31 Jul 2002 11:30:40 -0600, Dave Cinege wrote:
> On Wednesday 31 July 2002 16:32, Bengt Richter wrote:
>
>> if 1 and (var1 or qm('-d /etc/')):
>>
>> would already be legal Python.
>
> That's not the point. I'm not making legal Python but a 'short hand'
> subset, specifically a Python Unix Shell (aka bourne shell replacement)
>
> To put things in perspective:
> In bash sh: [ -d /etc/ ]
> In pysh: =(-d /etc/) (Maybe =('-d /etc/') )
>
> At runtime it will be parsed and replaced by:
> pysh_test('-d', '/etc/')
I think you definitely want to go check out section 18 of the standard
library reference, specifically the 'tokenize' and 'parser' modules. They
will save you huge amounts of wheel-reinventing.
A good approach might be to use the 'tokenize' module to lex your input,
then do simple fixups on patterns in the token stream. Then rebuild a
source string and have python run it. (the 'parser' module is, strangely,
missing a parsing function that takes tokens instead of strings)
If you want to work at the grammar level, have a look at John Aycock's
SPARK parsing toolkit, which comes with a skeleton python grammar already
implemented (for python 1.5.2, but it's a good start). You'll be able to
massage your parse tree into a nested-list representation that you can
feed directly into the 'parser' module's 'compileast' function.
URL for SPARK:
http://pages.cpsc.ucalgary.ca/~aycock/spark/
> In bash:
> for line in $(cat *.py); do echo $line; done # Yep time to retire this POS
>
> In pysh
> for $line in !(cat *(*.py)): print $line ;; # Ain't it pretty?
>
> FYI
> $ == variable prefix (I might be able to avoid using this, dunno)
> !() == Command Substitution
> *() == Shell glob (might become seemless, ie I search for glob chars!)
> ;; == explict newline
Ugh, definitely avoid the abhorrent '$' syntax - this is the year 2002, we
can do better. Just use python variables and provide a simple function or
statement to export particular variables to child processes
For command substitution, I'd personally prefer something like rc's
syntax, ie `{cat *.py} .. but since it basically comes down to your
favorite quoting character, it's not too important :)
the *() syntax will be difficult to parse around, and just gets in the way
of the user. I would try to avoid it.
I don't know what ';; == explicit newline' means .. can't the user just
press enter?
Let us know how the project goes ..
-Jim
More information about the Python-list
mailing list