Unicode Debugging Hints?
holger krekel
pyth at devel.trillke.net
Tue Oct 8 09:31:08 EDT 2002
Martin v. L?wis wrote:
> holger krekel <pyth at devel.trillke.net> writes:
>
> > does anyone have some small functions to answer
> > questions 'might this be latin1' or 'might this be utf8'
> > or 'is this definitely not latin1' and such?
>
> Some of these questions can be answered really simple
>
> def maybe_encoding(s, enc):
> try:
> unicode(s, enc)
> return 1
> except UnicodeError:
> return 0
>
> def is_ascii(s): return maybe_encoding(s, 'ascii')
>
> def is_utf_8(s): return not is_ascii(s) and maybe_encoding('utf-8')
>
> def maybe_latin_x(s):
> if is_ascii(s) or is_utf_8(s): return 0
> for c in s:
> if 128 <= ord(c) < 160:
> return 0
> return 1
>
> Telling apart the Latin-x variants is not really possible.
thanks for info and code.
holger
More information about the Python-list
mailing list