Unicode chr(150) en dash

"Martin v. Löwis" martin at v.loewis.de
Wed Apr 16 02:24:56 EDT 2008


> "C:\Python24\Lib\site-packages\MySQLdb\cursors.py", line 149, in
> execute query = query.encode(charset) UnicodeEncodeError: 'latin-1'
> codec can't encode character u'\u2013' in position 52: ordinal not in
> range(256) 

Here it complains that it deals with the character U+2013, which
is "EN DASH"; it complains that the encoding called "latin-1" does
not support that character.

That is a fact - Latin-1 does not support EN DASH.

> When I type 'print chr(150)' into a python command line window I get
> a LATIN SMALL LETTER U WITH CIRCUMFLEX
> (http://www.fileformat.info/info/unicode/char/00fb/index.htm),

That's because your console uses the code page 437:

py> chr(150).decode("cp437")
u'\xfb'
py> unicodedata.name(_)
'LATIN SMALL LETTER U WITH CIRCUMFLEX'

Code page 437, on your system, is the "OEM code page".

> but when I do so into a IDLE window I get a hypen (chr(45).

That's because IDLE uses the "ANSI code page" of your system,
which is windows code page 1252.

py> chr(150).decode("windows-1252")
u'\u2013'
py> unicodedata.name(_)
'EN DASH'

You actually *don't* get the character U+002D, HYPHEN-MINUS,
displayed - just a character that has, in your font, a glyph
which looks similar to the glyph for HYPHEN-MINUS.
However, HYPHEN-MINUS and EN DASH are different characters, and
IDLE displays the latter, not the former.

> I tried searching "en dash" or even "dash" into the encodings folder
> of python Lib, but I couldn't find anything.

You didn't ask a specific question, so I assume you are primarily
after an explanation.

HTH,
Martin



More information about the Python-list mailing list