Entering strings as user input but interpreting as Python input (sort of)

Chris Carlen crcarleRemoveThis at BOGUSsandia.gov
Mon Sep 17 20:01:38 EDT 2007


Hi:

I'm writing a Python program, a hex line editor, which takes in a line 
of input from the user such as:

 >>> cmd = raw_input('-').split()
-e 01 02 "abc def" 03 04
 >>> cmd
['e', '01', '02', '"abc', 'def"', '03', '04']

Trouble is, I don't want to split the quoted part where the space occurs.

So I would prefer the resulting list to contain:

['e', '01', '02', '"abc def"', '03', '04']

Furthermore, if the user entered:

-e 01 02 "abc \"def\"\r\n" 03 04

I would want the quoted part to be interpreted as if I entered it into 
Python itself (recognize escape sequences, and not split at spaces) as:

 >>> s = '"abc \"def\"\r\n"'
 >>> print s
"abc "def"
"
 >>>

In other words, if a quoted string occurs in the user input, I want only 
that part to be treated as a Python string.  Even more horrifying is 
that I want the outer quotes to remain as is (which Python doesn't do, 
of course).

I have begun to solve this problem by winding up writing what amounts to 
a custom split() method (I call it hsplit(), a function) which is a DFA 
that implements some of Python's string lexical analysis.  Code shown below.

The point of this in the context of the hex editor is that the user 
should be able to enter hex bytes without qualifications like "0xXX" but 
rather as simply: "0A 1B 2C" etc. but also be able to input a string 
without having to type in hex ASCII codes.  Hence the following input 
would be valid (the 'e' is the edit command to the editor):

-e 01 02 "a string with newline\n" 3d 4e 5f
-p


Is there a simpler way?

----------------------------------------------------------------
HSTRIP_NONE = 0
HSTRIP_IN_WORD = 1
HSTRIP_IN_QUOTE = 2
HSTRIP_IN_ESC = 3

def hsplit(string):
     lst = []
     word = []
     state = HSTRIP_NONE # not in word
     for c in string:

         if state == HSTRIP_NONE:
             if c == '"':
                 word.append(c)
                 state = HSTRIP_IN_QUOTE
             elif c != ' ':
                 word.append(c)
                 state = HSTRIP_IN_WORD
             # else c == ' ', so pass
         elif state == HSTRIP_IN_QUOTE:
             if c == '"':
                 word.append(c)
                 lst.append(''.join(word))
                 word = []
                 state = HSTRIP_NONE
             elif c == '\\':
                 state = HSTRIP_IN_ESC
             else:
                 word.append(c)
         elif state == HSTRIP_IN_ESC:
             if c == '\\':
                 word.append(c)
                 state = HSTRIP_IN_QUOTE
             elif c == '"':
                 word.append(c)
                 state = HSTRIP_IN_QUOTE
             elif c == 'n':
                 word.append('\n')
                 state = HSTRIP_IN_QUOTE
             else: # c == non escape or quote
		# for unrecognized escape, just put in verbatim
                 word.append('\\')
                 word.append(c)
                 state = HSTRIP_IN_QUOTE
         else: # if state == HSTRIP_IN_WORD
             if c == ' ' or c == '"':
                 lst.append(''.join(word))
                 if c == '"':
                     word = [c]
                     state = HSTRIP_IN_QUOTE
                 else:
                     word = []
                     state = HSTRIP_NONE
             else:
                 word.append(c)
     # this only happens if you run out of chars in string before a 
state change:
     if word: lst.append(''.join(word))
     return lst



----------------------------------------------------------------


-- 
Good day!

________________________________________
Christopher R. Carlen
Principal Laser&Electronics Technologist
Sandia National Laboratories CA USA
crcarleRemoveThis at BOGUSsandia.gov
NOTE, delete texts: "RemoveThis" and
"BOGUS" from email address to reply.



More information about the Python-list mailing list