Unicode Debugging Hints?

holger krekel pyth at devel.trillke.net
Tue Oct 8 09:31:08 EDT 2002


Martin v. L?wis wrote:
> holger krekel <pyth at devel.trillke.net> writes:
> 
> > does anyone have some small functions to answer
> > questions 'might this be latin1' or 'might this be utf8'
> > or 'is this definitely not latin1' and such?
> 
> Some of these questions can be answered really simple
> 
> def maybe_encoding(s, enc):
>   try:
>     unicode(s, enc)
>     return 1
>   except UnicodeError:
>     return 0
> 
> def is_ascii(s): return maybe_encoding(s, 'ascii')
> 
> def is_utf_8(s): return not is_ascii(s) and maybe_encoding('utf-8')
> 
> def maybe_latin_x(s):
>   if is_ascii(s) or is_utf_8(s): return 0
>   for c in s:
>     if 128 <= ord(c) < 160:
>       return 0
>   return 1
> 
> Telling apart the Latin-x variants is not really possible.

thanks for info and code.

    holger




More information about the Python-list mailing list