Problem with re module

Benjamin Kaplan benjamin.kaplan at case.edu
Tue Mar 22 15:07:06 EDT 2011


On Tue, Mar 22, 2011 at 2:40 PM, John Harrington
<beartiger.all at gmail.com> wrote:
> On Mar 22, 11:16 am, John Bokma <j... at castleamber.com> wrote:
>> John Harrington <beartiger.... at gmail.com> writes:
>> > I'm trying to use the following substitution,
>>
>> >      lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
>> > \2',lineList[i])
>>
>> > I intend this to match any string "\begin{document}" that doesn't end
>> > in a line ending.  If there's no line ending, then, I want to place
>> > two carriage returns between the string and the non-line end
>> > character.
>>
>> > However, this places carriage returns even when the string is followed
>> > directly after with a line ending.  Can someone explain to me why this
>> > match is not behaving as I intend it to, especially the ([^$])?
>>
>> [^$] matches: not a $ character
>>
>> You might want [^\n]
>
> Thank you, John.
>
> I thought that when you use "r" before the regex, $ matches an end of
> line.  But, in any case, if I use "[^\n]" as you suggest I get the
> same result.
>


r before a string has nothing to do with regexes. It signals a raw
string- escape sequences wont' be escaped.
>>> print 'a\tb'
a	b
>>> print r'a\tb'
a\tb

We use raw strings for regexes because otherwise, you'd have to
remember double up all your backslashes. And double up your doubled up
backslashes when you really want a backslash.

> Here's a script that illustrates the problem.  Any help would be
> appreciated!:
>
> #BEGIN SCRIPT
> import re
>
> outlist = []
> myfile  = "raw.tex"
>
> fin = open(myfile, "r")
> lineList = fin.readlines()
> fin.close()
>
> for i in range(0,len(lineList)):
>
>     lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
> \2',lineList[i])
>
>     outlist.append(lineList[i])
>
> fou = open(myfile, "w")
> for i in range(len(outlist)):
>   fou.write(outlist[i])
> fou.close
> #END SCRIPT
>
> And the file raw.tex:
>
> %BEGIN TeX FILE
> \begin{document}
> This line should remain right after the above line in the output, but
> doesn't
>
> \begin{document}Extra stuff here should appear below the begin line
> and does in the output.
> %END TeX FILE

Works for me. Do you have a space after the \begin{document} or
something? Because that get moved. You might want to check for
non-whitespace characters in the reges instead of just non-newlines.

> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list