[IPython-dev] ASCII Terminal IPython re-encodes bytes greater than 127

Thomas Ballinger tom at hackerschool.com
Sat Jul 26 20:04:00 EDT 2014


Since this isn't an edge case important to users who need characters like þ
to write text, I'm going to ignore bytes 128-255 on ascii terminals.

The system I've noticed this behavior on is iTerm2 on OSX 10.9, when I use
old-style meta key (under preferences/profiles/keys) and ~ (chr(ord('~') +
128)) which is displayed as þ. (so display of these characters seems to be
latin1) Anything pasted doesn't put these bytes in, instead making funny
replacements happen like þåß∂ƒ -> thas??

By "paste event" I meant only that we read a bunch of bytes at once and
infer that this was probably due to a paste. I'm going to ignore it either
way.

Thanks, excited to move forward with this knowing this solution is
something that's already working for the large IPython community.

Tom


On Sat, Jul 26, 2014 at 2:49 PM, Thomas Kluyver <takowl at gmail.com> wrote:

> On 26 July 2014 11:31, Thomas Ballinger <tom at hackerschool.com> wrote:
>
>> If I understand correctly, IPython is something like
>>
>> repr(eval(raw_input('>>> ').decode(sys.stdin.encoding, 'replace')))
>>
>
> Yes, that's more or less correct in Python 2. In Python 3, input() returns
> unicode, which makes things easier
>
>
>> and therefore b'þ' in an ascii encoded terminal will end up being the
>> unicode replacement character \ufffd because it can't be encoded in ascii,
>> the reported encoding. When the code is evaluated, if it's not in a string
>> literal it will be a syntax error (though in an ascii terminal this
>> traceback can't be written to stdout). If it appears in a unicode literal,
>> it's \ufffd, and it it's bytestring literal it's \xef\xbf\xdb, the utf8
>> encoding of the previous.
>>
>
> If the terminal is really ascii encoded, b'þ' is not even possible in the
> first place. If the terminal claims incorrectly to be ascii encoded, then
> it's not clear what bytes IPython sees when you type the character þ. The
> most likely candidates would be the single byte FE if it's really latin1 or
> cp1252, or the two bytes C3 BE if it's really UTF-8. So when IPython tries
> to decode it, it will become one or two \ufffd characters.
>
>
>
>> This is simpler than the behavior I guessed was happening because I
>> didn't look up what \ufffd was (
>> http://en.wikipedia.org/wiki/Specials_(Unicode_block) - I wrongly
>> assumed ipython was decoding this byte with latin-1 and then re-encoding it
>> with utf8).
>>
>> If one was in a position to reject keys on a byte-by-byte basis (as
>> bpython is) might it make sense to simply reject these bytes? If they come
>> from the keyboard, they're funny meta key presses (you pressed meta-a; it
>> doesn't do anything) and if they come from a paste event, the terminal
>> emulator is doing a terrible job encoding into the reported encoding.
>> However a few bytes missing would be more confusing though than a few
>> characters being replaced with \ufffd.
>>
>> I think I want to ignore these bytes individually, but replace them with
>> \ufffd when they happen in paste events, but I'd love to hear comments on
>> this (can take them off this list if they're off topic. Thanks very much
>> for input (and for IPython, which is obviously awesome).
>>
>
> What system has a terminal that claims to be ASCII but isn't? In my
> experience, most terminals on recent systems report either that they are
> UTF-8, or one of the Windows code pages.
>
> If the terminal does actually claim to be ASCII when it isn't, I'd
> consider that a bug in the terminal, and probably wouldn't feel bad about
> rejecting non-ascii keypresses.
>
> If you get paste events as a separate thing, you may be able to retrieve a
> unicode string from the clipboard, and avoid going via the terminal's
> encoding.
>
> Thomas
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140726/7968fa2c/attachment.html>


More information about the IPython-dev mailing list