Does Python mess with CRLFs?

Gilles Ganault nospam at nospam.com
Wed Nov 12 06:04:07 EST 2008


Hello

I'm stuck at understanding why Python can't extract some bit from an
HTML file using regexes, although I can find it just fine with
UltraEdit.

I wonder if Python rewrites CRLFs when reading a text file with
open/read?

Here's the code:
==========
f = open("content.html", "r") 
content = f.read()
f.close()

#BAD    
friends  = re.compile('</td></tr></table>\r\n</div>\r\n',re.IGNORECASE
| re.MULTILINE | re.DOTALL)

#GOOD
friends = re.compile('</td></tr></table>',re.IGNORECASE | re.MULTILINE
| re.DOTALL)

m = friends.search(content)
if m:
	print "Found"
else:
	print "List not found"
==========

Thank you for any tip.



More information about the Python-list mailing list