[Python-Dev] Encoding detection in the standard library?

Stephen J. Turnbull stephen at xemacs.org
Wed Apr 23 06:59:50 CEST 2008


Guido van Rossum writes:

 > To the contrary, an encoding-guessing module is often needed, and
 > guessing can be done with a pretty high success rate. Other Unicode
 > libraries (e.g. ICU) contain guessing modules. I suppose the API could
 > return two values: the guessed encoding and a confidence indicator.
 > Note that the locale settings might figure in the guess.

Not locale settings, but user configuration.  A Bayesian detector
(CodeBayes? hi, Skip!) might be a good way to go for servers, while a
simple language preference might really up the probability for user
agents.



More information about the Python-Dev mailing list