decoding keyboard input when using curses

Chris Jones cjns1989 at gmail.com
Sun May 31 02:22:20 EDT 2009


On Sat, May 30, 2009 at 04:55:19PM EDT, Arnaud Delobelle wrote:

> Hi all,

Disclaimer: I am not familiar with the curses python implementation and
I'm neither an ncurses nor a "unicode" expert by a long shot.

:-)

> I am looking for advice on how to use unicode with curses.  First I will
> explain my understanding of how curses deals with keyboard input and how
> it differs with what I would like.
> 
> The curses module has a window.getch() function to capture keyboard
> input.  This function returns an integer which is more or less:
> 
> * a byte if the key which was pressed is a printable character (e.g. a,
>   F, &);
> 
> * an integer > 255 if it is a special key, e.g. if you press KEY_UP it
>   returns 259.

The getch(3NCURSES) function returns an integer. Provide it's large
enough to accomodate the highest possible value, the actual size in
bytes of the integer should be irrelevant.

> As far as I know, curses is totally unicode unaware, 

My impression is that rather than "unicode unaware", it is "unicode
transparent" - or (nitpicking) "UTF8 transparent" - since I'm not sure
other flavors of unicode are supported.

> so if the key pressed is printable but not ASCII, 

.. nitpicking again, but ASCII is a 7-bit encoding: 0-127.

> the getch() function will return one or more bytes depending on the
> encoding in the terminal.

I don't know about the python implementation, but my guess is that it
should closely follow the underlying ncurses API - so the above is
basically correct, although it's not a question of the number of bytes
but rather the returned range of integers - if your locale is en.US then
that should be 0-255.. if it is en_US.utf8 the range is considerably
larger.

> E.g. given utf-8 encoding, if I press the key 'é' on my keyboard (which
> encoded as '\xc3\xa9' in utf-8), I will need two calls to getch() to get
> this: the first one will return 0xC3 and the second one 0xA9.

No. A single call to getch() will grab your " é" and return 0xc3a9,
decimal 50089.

> Instead of getting a stream of bytes and special keycodes (with value >
> 255) from getch(), what I want is a stream of *unicode characters* and
> special keycodes.

This is what getch(3NCURSES) does: it returns the integer value of one
"unicode character".

Likewise, I would assume that looping over the python equivalent of
getch() will not return a stream of bytes but rather a "stream" of
integers that map one to one to the "unicode characters" that were
entered at the terminal.

Note: I am only familiar with languages such as English, Spanish,
French, etc. where only one terminal cell is used for each glyph. My
understanding is that things get somewhat more complicated with
languages that require so-called "wide characters" - two terminal cells
per character, but that's a different issue.

> So, still assuming utf-8 encoding in the terminal, if I type:
> 
>     Té[KEY_UP]ça
> 
> iterating call to the getch() function will give me this sequence of
> integers:
> 
>     84, 195, 169, 259,   195, 167, 97
>     T-  é-------  KEY_UP ç-------  a-
> 
> But what I want to get this stream instead:
> 
>     u'T', u'é', 259, u'ç', u'a'

No, for the above, getch() will return:

     84, 50089, 259, 50087, 97

.. which is "functionally" equivalent to:

     u'T', u'é', 259, u'ç', u'a'

[..]

So shouldn't this issue boil down to just a matter of casting the
integers to the "u" data type?

This short snippet may help clarify the above:

-----------------------------------------------------------------------
#include <locale.h>
#include <ncurses.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int unichar;

int main(int argc, char *argv[])
{
  setlocale(LC_ALL, "en_US.UTF.8");        /* make sure UTF8       */
  initscr();                               /* start curses mode    */
  raw();
  keypad(stdscr, TRUE);                    /* pass special keys    */
  unichar = getch();                       /* read terminal        */

  mvprintw(24, 0, "Key pressed is = %4x ", unichar);

  refresh();
  getch();                                 /* wait                 */
  endwin();                                /* leave curses mode    */
  return 0;
}
-----------------------------------------------------------------------

Hopefully you have access to a C compiler:

$ gcc -lncurses uni00.c -o uni00

Hope this helps... Whatever the case, please keep me posted.

CJ



More information about the Python-list mailing list