Does Python mess with CRLFs?

Irmen de Jong irmen.NOSPAM at xs4all.nl
Wed Nov 12 13:52:13 EST 2008


Gilles Ganault wrote:
> Hello
> 
> I'm stuck at understanding why Python can't extract some bit from an
> HTML file using regexes, although I can find it just fine with
> UltraEdit.
> 
> #BAD    
> friends  = re.compile('</td></tr></table>\r\n</div>\r\n',re.IGNORECASE
> | re.MULTILINE | re.DOTALL)

If you keep running into trouble and you're sure it's related to the newlines,
maybe it helps using the 'whitespace' symbol instead of \r\n in your expression:
  re.compile('</td></tr></table>\\s*</div>\\s*', .... )

Other than that, hard to say what's not working as expected without knowing
the exact contents of the "content.html" file you're searching in....

--irmen



More information about the Python-list mailing list