[Tutor] help with re module and parsing data

Mon Mar 7 14:24:36 CET 2011

>> import re
>> file = open('file.txt','r')
>> file2 = open('newfile.txt','w')
>>
>> LineFile = ' '
>>
>> for line in file:
>>    LineFile += line
>>
>> StripRcvdCnt = re.compile('(P\w+\S\Content|Re\w+\S\Content)')
>>
>> FindRcvdCnt = re.findall(StripRcvdCnt, LineFile)
>>
>> for SrcStr in FindRcvdCnt:
>>    file2.write(SrcStr)
>>
>
> Is there any particular reason why you're using regular expressions
> for this?  You are already iterating over the lines in your first for
> loop.  You can just make the tests you need there.
>
> for line in file:
>  if 'Recvd-Content' in line or 'Published-Content' in line:
>    <do something with the line>
>
> Your regular expression seems like it will match a lot more strings
> than the two you mentioned earlier.
>
> Also, 'file' is a python built-in.  It will be best to use a different
> name for your variable.

i have a few suggestions as well:

1) class names should be titlecased, not ordinary variables, so
LineFile should be linefile, line_file, or lineFile.

2) you don't need to read in the file one line at-a-time. you can just
do linefile = f.read() ... this reads the entire file in as one
massive string.

3) you don't need to compile your regex (unless you will be using this
pattern over and over within one execution of this script). you can
just call findall() directly: findrcvdcnt =
re.findall('(P\w+\S\Content|Re\w+\S\Content)', LineFile)

hope this helps!
-- wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python", Prentice Hall, (c)2007,2001
"Python Fundamentals", Prentice Hall, (c)2009
    http://corepython.com

wesley.chun : wescpy-gmail.com : @wescpy
python training and technical consulting
cyberweb.consulting : silicon valley, ca
http://cyberwebconsulting.com