Questions about regex

Bobby bobby.house at gmail.com
Fri May 29 21:27:32 EDT 2009


On May 29, 1:26 pm, Jared.S.Ba... at gmail.com wrote:
> Hello,
>
> I'm new to python and I'm having problems with a regular expression. I
> use textmate as my editor and when I run the regex in textmate it
> works fine, but when I run it as part of the script it freezes. Could
> anyone help me figure out why this is happening and how to fix it.
> Here is the script:
>
> ======================================================
> # regular expression search and replace
> import sys, os, re, string, csv
>
> #Open the file and taking its data
> myfile=open('Steve_query3.csv') #Steve_query_test.csv
> #create an error flag  to loop the script twice
> #store all file's data in the string object 'text'
> myfile.seek(0)
> text = myfile.read()
>
> for i in range(2):
>         #def textParse(text, reRun):
>         print 'how many times is this getting executed', i
>
>         #Now to create the newfile 'test' and write our 'text'
>         newfile = open('Steve_query3_out.csv', 'w')
>         #open the new file and set it with 'w' for "write"
>         #loop trough 'text' clean them up and write them into the 'newfile'
>                         #sub(   pattern, repl, string[, count])
>                         #"sub("(?i)b+", "x", "bbbb BBBB")" returns 'x x'.
>         text = re.sub('(\<(/?[^\>]+)\>)', "", text)#remove the HTML
>         text = re.sub('/<!--(.|\s)*?-->/', "", text) #remove comments  <!--[^
> \-]+-->
>         text = re.sub('\/\*(.|\s)*?;}', "", text) #remove css formatting
>         #remove a bunch of word formatting yuck
>         text = re.sub(" ", " ", text)
>         text = re.sub("<", "<", text)
>         text = re.sub(">", ">", text)
>         text = re.sub(""|&rquot;|“", "\'", text)
> #===================================
> #The two following lines are the ones giving me the problems
>         text = re.sub("w:(.|\s)*?\n", "", text)
>         text = re.sub("UnhideWhenUsed=(.|\s)*?\n", "", text)
> #===========================================
>         text = re.sub(re.compile('^\r?\n?$', re.MULTILINE), '', text) #remove
> the extra whitespace
>         #now write out the new file and close it
>         newfile.write(text)
>         newfile.close()
>
>         #open the newfile and run the script again
>         #Open the file and taking its data
>
>         myfile=open('Steve_query3_out.csv') #Steve_query_test.csv
>         #store all file's data in the string object 'text'
>         myfile.seek(0)
>         text = myfile.read()
>
> Thanks for the help,
>
> -Jared

Can you give a string that you would expect the regex to match and
what the expected result would be? Currently, it looks like the
interesting part of the regex (.|\s)*? would match any character of
any length once. There seems to be some redundancy that makes it more
confusing then it needs to be. I'm pretty sure that . will also match
anything that \s will match or maybe you just need to escape . because
you meant for it to be a literal.



More information about the Python-list mailing list