Unicode

leam hall leamhall at gmail.com
Sun Sep 17 12:20:08 EDT 2017


On Sun, Sep 17, 2017 at 9:13 AM, Peter Otten <__peter__ at web.de> wrote:

> Leam Hall wrote:
>
> > On 09/17/2017 08:30 AM, Chris Angelico wrote:
> >> On Sun, Sep 17, 2017 at 9:38 PM, Leam Hall <leamhall at gmail.com> wrote:
> >>> Still trying to keep this Py2 and Py3 compatible.
> >>>
> >>> The Py2 error is:
> >>>          UnicodeEncodeError: 'ascii' codec can't encode character
> >>>          u'\xf6' in position 8: ordinal not in range(128)
> >>>
> >>> even when the string is manually converted:
> >>>          name    = unicode(self.name)
> >>>
> >>> Same sort of issue with:
> >>>          name    = self.name.decode('utf-8')
> >>>
> >>>
> >>> Py3 doesn't like either version.
> >>
> >> You got a Unicode *EN*code error when you tried to *DE* code. That's a
> >> quirk of Py2's coercion behaviours, so the error's a bit obscure, but
> >> it means that you (most likely) actually have a Unicode string
> >> already. Check what type(self.name) is, and see if the problem is
> >> actually somewhere else.
> >>
> >> (It's hard to give more specific advice based on this tiny snippet,
> >> sorry.)
> >>
> >> ChrisA
> >>
> >
> > Chris, thanks! I see what you mean.
>
> I don't think so. You get a unicode from the database,
>
> $ python
> Python 2.7.6 (default, Oct 26 2016, 20:30:19)
> [GCC 4.8.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sqlite3
> >>> db = sqlite3.connect(":memory:")
> >>> cs = db.cursor()
> >>> cs.execute("select 'foo';").fetchone()
> (u'foo',)
> >>>
>
> and when you try to decode it (which is superfluous as you already have
> unicode!) Python does what you ask for. But to be able to decode it has to
> encode first and by default it uses the ascii codec for that attempt. For
> an
> all-ascii string
>
> u"foo".encode("ascii") --> "foo"
>
> and thus
>
> u"foo".decode("utf-8)
>
> implemented as
>
> u"foo".encode("ascii").decode("utf-8") --> u"foo"
>
> is basically a noop. However
>
> u"äöü".encode("ascii") --> raises UnicodeENCODEError
>
> and thus
>
> u"äöü".decode("utf-8")
>
> fails with that. Unfortunately nobody realizes that the encoding failed and
> thus will unsuccessfully try and specify other encodings for the decoding
> step
>
> u"äöü".decode("latin1")  # also fails
>
> Solution: if you already have unicode, leave it alone.
>

Doesn't seem to work. The failing code takes the strings as is from the
database. it will occasionally fail when a name comes up that uses
a non-ascii character.

Lines 44, 60, 66, 67.

https://github.com/makhidkarun/py_tools/blob/master/lib/character.py

Leam



More information about the Python-list mailing list