simple string parsing ?

Alexis Roda alexis.roda at urv.es
Thu Sep 9 10:57:52 EDT 2004


TAG wrote:
> Hi,
> 
> I am new to python and would like to parse a string, well acually a
> formula and get the stuff grouped together
> eg:
> 
> if  I have :
> 
> =+GC142*(GC94+0.5*sum(GC96:GC101))
> 
> and I want to get :
> 
> ['=', '+', 'GC142', '*', '(', 'GC94', '+', '0.5', '*', 'sum', '(',
> 'GC96', ':', 'GC101', ')', ')']
> 
> how can I get this ??????

The most generic way is to use some lexycal analizer tool. I have not 
tested it, but python comes with shlex.

In the example you send it seems enough to iterate over the input one 
char at a time, grouping chars until you find some non alphabetic, 
numeric, point char.

tokens = []
current_token = ''
for char in input :
   if not is_delimiter(char) :
     current_token += char
   else :
     if current_token :
       tokens.append(current_token)
       current_token = ''
     tokens.append(char)

define is_delimiter() so it returns true when the argument is not 
alphabetic, numerical nor the decimal point.

The "right" way would be something like:

what_i_have_found_until_now = '' # what for short
for char in input :
   if is_meaningful(what + char) :
     what = what + char
     keep looping
   else :
     'what' is a token
     do something with it
     what = char

is_meaningful() encapsulates the lexycal rules for the language

Te execution will be:

= is meaningful
=+ is not -> token =
+
+G no -> token +
G yes
GC yes
GC1 yes
GC14 yes
GC142 yes
GC142* no -> token GC142
*
...
and so on



HTH
-- 
                                    ////
                                   (@ @)
----------------------------oOO----(_)----OOo--------------------------
<>               Ojo por ojo y el mundo acabara ciego
/\ Alexis Roda - Universitat Rovira i Virgili - Reus, Tarragona (Spain)
-----------------------------------------------------------------------




More information about the Python-list mailing list