Get document as normal text and not as binary data
Kent Johnson
kent37 at tds.net
Mon Mar 28 14:37:11 EST 2005
Markus Franz wrote:
> Hi.
>
> I used urllib2 to load a html-document through http. But my problem
> is:
> The loaded contents are returned as binary data, that means that every
> character is displayed like lÀÃt, for example. How can I get the
> contents as normal text?
My guess is the html is utf-8 encoded - your sample looks like utf-8-interpreted-as-latin-1. Try
contents = f.read().decode('utf-8')
Kent
>
> My script was:
>
> import urllib2
> req = urllib2.Request(url)
> f = urllib2.urlopen(req)
> contents = f.read()
> print contents
> f.close()
>
> Thanks!
>
> Markus
More information about the Python-list
mailing list