[Tutor] about python function +reg expression
Abdirizak abdi
a_abdi406@yahoo.com
Sun Mar 23 00:59:01 2003
--0-1284257744-1048399081=:41886
Content-Type: multipart/alternative; boundary="0-1748435489-1048399081=:41886"
--0-1748435489-1048399081=:41886
Content-Type: text/plain; charset=us-ascii
hi everyone,
I am working on a fucnction that does the following:
the function takes a string as follows:
<EQN/>what it is in here<EQN/> and
first it filters to get the middle part of the string:
as follows:--- what it is in here--- and then I want to tag each token
as follows:
<W>what</W> <W>it</W> <W>is</W> <W>in</W> <W>here</W>
and finally it returns:
<EQN/> <W>what</W> <W>it</W> <W>is</W> <W>in</W> <W>here</W> <EQN/>
My problem:
I am filtering a string by using regular expression which below
from : <EQN/> what it is in here <EQN/>
to : what it is in here
Regular expression result:
>>>
>>>
>>> import re
>>> test = '<EQN/> of a conditioned word <EQN/> '
>>> text =re.compile(r"(?<=<EQN/>).*?(?=<EQN/>)")
>>> X = text.findall(test)
>>> print X
[' of a conditioned word '] ---> it extracts what I want that is fine
my problem is how can I manipulate this so that I have a list to manipulate such as this [ 'of' ', 'a','conditioned', 'word' ] so that I tag with <W>....</W> by using a loop as mentioned above:
I have also attached the EQNfunc.py with this e-mail for reference
X.split() doesn't work simply
thanks in advance
---------------------------------
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
--0-1748435489-1048399081=:41886
Content-Type: text/html; charset=us-ascii
<P>hi everyone,</P>
<P>I am working on a fucnction that does the following:<BR> the function takes a string as follows:<BR> <EQN/>what it is in here<EQN/> and<BR> first it filters to get the middle part of the string:<BR> as follows:--- <STRONG>what it is in here</STRONG>--- and then I want to tag each token<BR> as follows:<BR> <W>what</W> <W>it</W> <W>is</W> <W>in</W> <W>here</W><BR> and finally it returns:<BR> <EQN/> <W>what</W> <W>it</W> <W>is</W> <W>in</W> <W>here</W> <EQN/><BR> My problem:</P>
<P> I am filtering a string by using regular expression which below</P>
<P>from : <EM><STRONG><EQN/> what it is in here <EQN/> <BR>to : what it is in here </STRONG></EM></P>
<P>Regular expression result:</P>
<P>>>><BR>>>><BR>>>> import re<BR>>>> test = '<EQN/> of a conditioned word <EQN/> '<BR>>>> text =re.compile(r"(?<=<EQN/>).*?(?=<EQN/>)")<BR>>>> X = text.findall(test)<BR>>>> print X<BR><STRONG>[' of a conditioned word '] ---> it extracts what I want that is fine</STRONG></P>
<P>my problem is how can I manipulate this so that I have a list to manipulate such as this <STRONG>[ 'of' ', 'a','conditioned', 'word' ] </STRONG>so that I tag with <W>....</W> by using a loop as mentioned above:</P>
<P>I have also attached the EQNfunc.py with this e-mail for reference</P>
<P><STRONG>X.split() doesn't work simply</STRONG></P>
<P>thanks in advance</P><p><br><hr size=1>Do you Yahoo!?<br>
<a href="http://rd.yahoo.com/platinum/evt=8162/*http://platinum.yahoo.com/splash.html">Yahoo! Platinum</a> - Watch CBS' NCAA March Madness, <a href="http://rd.yahoo.com/platinum/evt=8162/*http://platinum.yahoo.com/splash.html">live on your desktop</a>!
--0-1748435489-1048399081=:41886--
--0-1284257744-1048399081=:41886
Content-Type: text/plain; name="EQNfunc.py"
Content-Description: EQNfunc.py
Content-Disposition: inline; filename="EQNfunc.py"
import re
def EQNfunc(text1):
""" this function takes a string as follows:
<EQN/>what it is in here<EQN/> and
first it filters to get the middle part of the string:
as follows:--- what it is in here--- and then it tags
with each token with as follows:
<W>what</W> <W>it</W> <W>is</W> <W>in</W> <W>here</W>
"""
initial_tag = "<EQN/>"
spacing = " "
tag_W = "W"
#print text1 # for test
buf = re.compile(r"(?<=<EQN/>).*?(?=<EQN/>)")
#temp = buf.findall(text1)
#temp = X.split()
for i in range(0,len(temp),1):
temp = buf.findall(buf)
print temp
temp[i] = '<%s>%s</%s>%s' %(tag_W,temp[i],tag_W,spacing)
# join the text
#joined_text =''.join(temp)
last_result = initial_tag + str(temp) + initial_tag
return last_result
#-------------------------------------------
test = '<EQN/> of a conditioned word <EQN/> '
# buf = re.compile(r"(?<=<EQN/>).*?(?=<EQN/>)")
# X = buf.findall(test)
Y = EQNfunc(test)
print Y
--0-1284257744-1048399081=:41886--