[Python-Dev] Encoding detection in the standard library?
Tony Nelson
tonynelson at georgeanelson.com
Mon Apr 21 20:34:50 CEST 2008
At 1:14 PM -0400 4/21/08, David Wolever wrote:
>On 21-Apr-08, at 12:44 PM, skip at pobox.com wrote:
>>
>> David> Is there some sort of text encoding detection module is the
>> David> standard library? And, if not, is there any reason not
>> to add
>> David> one?
>> No, there's not. I suspect the fact that you can't correctly
>> determine the
>> encoding of a chunk of text 100% of the time mitigates against it.
>Sorry, I wasn't very clear what I was asking.
>
>I was thinking about making an educated guess -- just like chardet
>(http://chardet.feedparser.org/).
>
>This is useful when you get a hunk of data which _should_ be some
>sort of intelligible text from the Big Scary Internet (say, a posted
>web form or email message), and you want to do something useful with
>it (say, search the content).
Feedparser.org's chardet can't guess 'latin1', so it should be used as a
last resort, just as the docs say.
--
____________________________________________________________________
TonyN.:' <mailto:tonynelson at georgeanelson.com>
' <http://www.georgeanelson.com/>
More information about the Python-Dev
mailing list