Problem with re module

John Harrington beartiger.all at gmail.com
Tue Mar 22 15:30:58 EDT 2011


On Mar 22, 12:07 pm, Benjamin Kaplan <benjamin.kap... at case.edu> wrote:
> On Tue, Mar 22, 2011 at 2:40 PM, John Harrington
>
>
>
> <beartiger.... at gmail.com> wrote:
> > On Mar 22, 11:16 am, John Bokma <j... at castleamber.com> wrote:
> >> John Harrington <beartiger.... at gmail.com> writes:
> >> > I'm trying to use the following substitution,
>
> >> >      lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
> >> > \2',lineList[i])
>
> >> > I intend this to match any string "\begin{document}" that doesn't end
> >> > in a line ending.  If there's no line ending, then, I want to place
> >> > two carriage returns between the string and the non-line end
> >> > character.
>
> >> > However, this places carriage returns even when the string is followed
> >> > directly after with a line ending.  Can someone explain to me why this
> >> > match is not behaving as I intend it to, especially the ([^$])?
>
> >> [^$] matches: not a $ character
>
> >> You might want [^\n]
>
> > Thank you, John.
>
> > I thought that when you use "r" before the regex, $ matches an end of
> > line.  But, in any case, if I use "[^\n]" as you suggest I get the
> > same result.
>
> r before a string has nothing to do with regexes. It signals a raw
> string- escape sequences wont' be escaped.>>> print 'a\tb'
> a       b
> >>> print r'a\tb'
>
> a\tb
>
> We use raw strings for regexes because otherwise, you'd have to
> remember double up all your backslashes. And double up your doubled up
> backslashes when you really want a backslash.
>
>
>
> > Here's a script that illustrates the problem.  Any help would be
> > appreciated!:
>
> > #BEGIN SCRIPT
> > import re
>
> > outlist = []
> > myfile  = "raw.tex"
>
> > fin = open(myfile, "r")
> > lineList = fin.readlines()
> > fin.close()
>
> > for i in range(0,len(lineList)):
>
> >     lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
> > \2',lineList[i])
>
> >     outlist.append(lineList[i])
>
> > fou = open(myfile, "w")
> > for i in range(len(outlist)):
> >   fou.write(outlist[i])
> > fou.close
> > #END SCRIPT
>
> > And the file raw.tex:
>
> > %BEGIN TeX FILE
> > \begin{document}
> > This line should remain right after the above line in the output, but
> > doesn't
>
> > \begin{document}Extra stuff here should appear below the begin line
> > and does in the output.
> > %END TeX FILE
>
> Works for me. Do you have a space after the \begin{document} or
> something? Because that get moved. You might want to check for
> non-whitespace characters in the reges instead of just non-newlines.
>
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
>

Matching the non-whitespace works, but I'm troubled I can't match a
non-end-of-line.  No, there was no space after the string.

Thank you for your help, Ben




More information about the Python-list mailing list