[Python-Dev] Encoding detection in the standard library?

David Wolever wolever at cs.toronto.edu
Tue Apr 22 03:41:51 CEST 2008


On 21-Apr-08, at 5:31 PM, Martin v. Löwis wrote:
>> This is useful when you get a hunk of data which _should_ be some
>> sort of intelligible text from the Big Scary Internet (say, a posted
>> web form or email message), and you want to do something useful with
>> it (say, search the content).
> I don't think that should be part of the standard library. People
> will mistake what it tells them for certain.
As Oleg mentioned, if the method is called something like  
'guess_encoding', I think we could live with clear consciences.

IMO, encoding estimation is something that many web programs will  
have to deal with, so it might as well be built in; I would prefer  
the option to run `text=input.encode('guess')` (or something similar)  
than relying on an external dependency or worse yet using a hand- 
rolled algorithm.


More information about the Python-Dev mailing list