[Tutor] about pyhton + regular expression
Abdirizak abdi
a_abdi406@yahoo.com
Thu Mar 20 09:22:02 2003
--0-177324236-1048090608=:47956
Content-Type: text/plain; charset=us-ascii
Hi everyone,
thanks gregor and Michael for your contribution:
While
>>> buf = re.compile("[a-zA-Z]+")
>>> buf.findall(str)
['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in',
'statistical', 'methods', 'for', 'natural', 'language', 'processing']
>>>
this is the result that I want:
['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in',
'statistical', 'methods', 'for', 'natural', 'language', 'processing', '.']
gregor yes it was what I wanted but also including the full stop, commas,double quote and also single quote. I need to tokenize each of these individually as other tokens that appear in the list. Do I have to to do it separate RE and evaluate a condtional statement or only one RE (regular expression) can be done ?
Another question
when you are reading a text from a file is it really necesary to scan by using while loop or the following is enough and then scan with a loop to manipulate what is the real difference ?
infile = open(' file.txt ')
buffer = infile.readline()
---------------------------------
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
--0-177324236-1048090608=:47956
Content-Type: text/html; charset=us-ascii
<P>Hi everyone,</P>
<P>thanks gregor and Michael for your contribution:</P>
<P><EM><STRONG>While<BR><BR> >>> buf = re.compile("[a-zA-Z]+") <BR> >>> buf.findall(str)<BR>['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in', <BR>'statistical', 'methods', 'for', 'natural', 'language', 'processing']<BR> >>><BR></STRONG>this is the result that I want:</EM></P>
<P><STRONG><EM>['Data', 'sparseness', 'is', 'an', 'inherent', 'problem', 'in', <BR>'statistical', 'methods', 'for', 'natural', 'language', 'processing', '.']<BR></EM></STRONG></P>
<P>gregor yes it was what I wanted but also including the full stop, commas,double quote and also single quote. I need to tokenize each of these individually as other tokens that appear in the list. Do I have to to do it separate RE and evaluate a condtional statement or only one RE (regular expression) can be done ? </P>
<P><STRONG>Another question</STRONG></P>
<P> when you are reading a text from a file is it really necesary to scan by using while loop or the following is enough and then scan with a loop to manipulate what is the real difference ? </P>
<P><EM>infile = open(' file.txt ')<BR>buffer = infile.readline()<BR></EM></P><p><br><hr size=1>Do you Yahoo!?<br>
<a href="http://rd.yahoo.com/platinum/evt=8162/*http://platinum.yahoo.com/splash.html">Yahoo! Platinum</a> - Watch CBS' NCAA March Madness, <a href="http://rd.yahoo.com/platinum/evt=8162/*http://platinum.yahoo.com/splash.html">live on your desktop</a>!
--0-177324236-1048090608=:47956--