how to detect the character encoding in a web page ?
Kurt Mueller
kurt.alfred.mueller at gmail.com
Mon Dec 24 03:34:16 EST 2012
Am 24.12.2012 um 04:03 schrieb iMath:
> but how to let python do it for you ?
> such as these 2 pages
> http://python.org/
> http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx
> how to detect the character encoding in these 2 pages by python ?
If you have the html code, let
chardetect.py
do an educated guess for you.
http://pypi.python.org/pypi/chardet
Example:
$ wget -q -O - http://python.org/ | chardetect.py
stdin: ISO-8859-2 with confidence 0.803579722043
$
$ wget -q -O - 'http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx' | chardetect.py
stdin: utf-8 with confidence 0.87625
$
Grüessli
--
kurt.alfred.mueller at gmail.com
More information about the Python-list
mailing list