simple string parsing ?
Alexis Roda
alexis.roda at urv.es
Thu Sep 9 10:57:52 EDT 2004
TAG wrote:
> Hi,
>
> I am new to python and would like to parse a string, well acually a
> formula and get the stuff grouped together
> eg:
>
> if I have :
>
> =+GC142*(GC94+0.5*sum(GC96:GC101))
>
> and I want to get :
>
> ['=', '+', 'GC142', '*', '(', 'GC94', '+', '0.5', '*', 'sum', '(',
> 'GC96', ':', 'GC101', ')', ')']
>
> how can I get this ??????
The most generic way is to use some lexycal analizer tool. I have not
tested it, but python comes with shlex.
In the example you send it seems enough to iterate over the input one
char at a time, grouping chars until you find some non alphabetic,
numeric, point char.
tokens = []
current_token = ''
for char in input :
if not is_delimiter(char) :
current_token += char
else :
if current_token :
tokens.append(current_token)
current_token = ''
tokens.append(char)
define is_delimiter() so it returns true when the argument is not
alphabetic, numerical nor the decimal point.
The "right" way would be something like:
what_i_have_found_until_now = '' # what for short
for char in input :
if is_meaningful(what + char) :
what = what + char
keep looping
else :
'what' is a token
do something with it
what = char
is_meaningful() encapsulates the lexycal rules for the language
Te execution will be:
= is meaningful
=+ is not -> token =
+
+G no -> token +
G yes
GC yes
GC1 yes
GC14 yes
GC142 yes
GC142* no -> token GC142
*
...
and so on
HTH
--
////
(@ @)
----------------------------oOO----(_)----OOo--------------------------
<> Ojo por ojo y el mundo acabara ciego
/\ Alexis Roda - Universitat Rovira i Virgili - Reus, Tarragona (Spain)
-----------------------------------------------------------------------
More information about the Python-list
mailing list