Is 0 > None?? (fwd) (fwd)

Wed Sep 5 05:32:48 EDT 2001

"Terry Reedy" <tjreedy at home.com> wrote in message
news:Ay9l7.169130$EP6.48752519 at news1.rdc2.pa.home.com...
>
> "Alex Martelli" <aleax at aleax.it> wrote in message
> news:mailman.999590525.31032.python-list at python.org...
> a long message on the meaning of
>
>  >     UnicodeError: ASCII decoding error: ordinal not in range(128)
>
> Ok, Alex, at least I get it now.  To summarize:
>
> A. Python lets me put any 8-bit pattern in any position ('char') of a
> string (ie, it's 8-bit clean).
> It assumes that I know what I am doing for the calculations and
> interactions with other systems and programs that I intend to do.

Yep -- so far, so good.

> B. By default default, when asked (directly or indirectly) to
> interpret a string as a string of chars, Python (the interpreter) only
> understands the 7-bit ASCII chars.

Not really.  s/interpret a string as a string of chars/widen
a plain string to a Unicode string/.  The characters having a
special meaning to "Python, the interpreter" (the compiler,
actually) happen to all be within the 7-bit ASCII range, but
not even all of those can be meaningful (except in string
literals, where all are): for example, the '@' character has
no semantic role in Python, and neither do most control chars.

It's when a plain string must be implicitly widened to Unicode
that the 'default default' enters the picture -- specifically,
the codec whose name is returned by sys.getdefaultencoding().
This is currently 'ascii' (unless you've arranged things
differently in site.py or sitecustomize.py -- when site.py
is done executing, it removes sys.setdefaultencoding so you
can't change the default encoding any longer) -- and it
behaves just like its name would lead one to believe:-).

> If asked to interpret a
> high-bit-set pattern, it literally does not know what that pattern
> means, and so, rather than guess, it stops with the above error
> message.

When you explicitly convert plain string to Unicode ones, you
can tell Python what to do in case of errors:

>>> x="ciaó"
>>> unicode(x,'ascii','ignore')
u'cia'
>>> unicode(x,'ascii','replace')
u'cia\ufffd'
>>> unicode(x,'ascii','strict')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII decoding error: ordinal not in range(128)
>>>

Only with 'strict' handling of errors does Python raise an
exception (which doesn't have to mean "stopping" -- you can
catch and handle exceptions with the usual try/except syntax).

With 'ignore' handling, errant characters are silently,
well, ignored, and with 'replace' handling, they turn into
a u'\uFFFD', the error-character I believe.  But 'strict'
is the default when the widening is implicit (and I don't
think you can override THAT part of the default, whatever
you do in site.py and/or sitecustomize.py:-).

Alex