[Tutor] about pyhton + regular expression

Thu Mar 20 09:22:02 2003

--0-177324236-1048090608=:47956
Content-Type: text/plain; charset=us-ascii

Hi everyone,

thanks gregor and Michael for your contribution:

While

 >>> buf = re.compile("[a-zA-Z]+") 
 >>> buf.findall(str)
['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in', 
'statistical', 'methods', 'for', 'natural', 'language', 'processing']
 >>>
this is the result that I want:

['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in', 
'statistical', 'methods', 'for', 'natural', 'language', 'processing', '.']

gregor yes it was what I wanted but also including the full stop, commas,double quote and also single quote. I need to tokenize each of these individually as other tokens  that appear in the list. Do I have to to do it separate RE and evaluate a condtional statement  or only one RE (regular expression) can be done ? 

Another question

 when you are reading a text from a file  is it really necesary to scan by using while loop or the following is enough and then scan with a loop to manipulate what is the real difference ? 

infile = open(' file.txt ')
buffer = infile.readline()

---------------------------------
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
--0-177324236-1048090608=:47956
Content-Type: text/html; charset=us-ascii

<P>Hi everyone,</P>
<P>thanks gregor and Michael for your contribution:</P>
<P><EM><STRONG>While<BR><BR> &gt;&gt;&gt; buf = re.compile("[a-zA-Z]+") <BR> &gt;&gt;&gt; buf.findall(str)<BR>['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in', <BR>'statistical', 'methods', 'for', 'natural', 'language', 'processing']<BR> &gt;&gt;&gt;<BR></STRONG>this is the result that I want:</EM></P>
<P><STRONG><EM>['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in', <BR>'statistical', 'methods', 'for', 'natural', 'language', 'processing', '.']<BR></EM></STRONG></P>
<P>gregor yes it&nbsp;was what I wanted but also including the full stop, commas,double quote and also single quote. I need to tokenize each of these individually as other tokens&nbsp; that appear in the list. Do I have to to do it separate RE and evaluate a condtional statement  or only one RE (regular expression) can be done ?&nbsp;</P>
<P><STRONG>Another question</STRONG></P>
<P>&nbsp;when you are reading a text from a file&nbsp; is it really necesary to scan by using while loop or the following is enough and then scan with a loop to manipulate what is the real difference ? </P>
<P><EM>infile = open(' file.txt ')<BR>buffer = infile.readline()<BR></EM></P><p><br><hr size=1>Do you Yahoo!?<br>
<a href="http://rd.yahoo.com/platinum/evt=8162/*http://platinum.yahoo.com/splash.html">Yahoo! Platinum</a> - Watch CBS' NCAA March Madness, <a href="http://rd.yahoo.com/platinum/evt=8162/*http://platinum.yahoo.com/splash.html">live on your desktop</a>!
--0-177324236-1048090608=:47956--