Problem with re module

Ethan Furman ethan at stoneleaf.us
Tue Mar 22 19:26:21 EDT 2011


John Harrington wrote:
> Here's a script that illustrates the problem.  Any help would be
> appreciated!:
> 
> #BEGIN SCRIPT
> import re
> 
> outlist = []
> myfile  = "raw.tex"
> 
> fin = open(myfile, "r")
> lineList = fin.readlines()
> fin.close()
> 
> for i in range(0,len(lineList)):
> 
>      lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
> \2',lineList[i])
> 
>      outlist.append(lineList[i])
> 
> fou = open(myfile, "w")
> for i in range(len(outlist)):
>    fou.write(outlist[i])
> fou.close
> #END SCRIPT
> 
> And the file raw.tex:
> 
> %BEGIN TeX FILE
> \begin{document}
> This line should remain right after the above line in the output, but
> doesn't
> 
> \begin{document}Extra stuff here should appear below the begin line
> and does in the output.
> %END TeX FILE

Here's the important tidbit:

     re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line)

 From the docs:
'.'
(Dot.) In the default mode, this matches any character except a newline. 
If the DOTALL flag has been specified, this matches any character 
including a newline.

'+'
Causes the resulting RE to match 1 or more repetitions of the preceding 
RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will 
not match just ‘a’.


And here's the entire program, a bit more pythonically:

8<---------------------------------------------------------------
import re

outlist = []
myfile  = "raw.tex"

fin = open(myfile, "r")
lineList = fin.readlines()
fin.close()

for line in lineList:
      line = re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line)
      outlist.append(line)

fou = open(myfile, "w")
for line in outlist:
    fou.write(line)
fou.close
8<---------------------------------------------------------------

Hope this helps!

~Ethan~



More information about the Python-list mailing list