Jython: Upper-ASCII characters '\351' from chr(233)

Maurice Bauhahn bauhahnm at clara.net
Wed Apr 25 17:00:23 EDT 2001


Thank you, Steve. Maybe it would help if I could explain what I am doing. I'm
trying to write a programme to transcode eight-bit encodings to Unicode
encodings (the Cambodian/Khmer language) and then do letter-pair frequency
studies. Since I will be comparing characters (not integers) to compare against
the key, I need to have characters as the key. That effort has in fact now been
successfull. My next problem, discussed elsewhere in comp.lang.python, is to
import Unicode escaped characters/strings from another 8-bit encoded file into a
Jython dictionary. (Presumably the solution is in the codecs module).

Steve Holden wrote:

> "Maurice Bauhahn" <bauhahnm at clara.net> wrote in message
> news:mailman.988158281.19384.python-list at python.org...
> > Thank you very much for your persistent help.
> >
> > I was able to get the 8th bit characters to act as keys...with a somewhat
> > complex construction: chr(int(linesplit[0])). Linesplit had decimal
> > numbers in text format.
> >
> Would this shed any light on your original question or help in solving your
> problem more compactly? Note that this is CPython, not Jython, but
> portability should make all this work in both implementations. From what
> I've read, it seems to be your need to see decimal numbers in the source
> whcih led you to these contortions.
>

This solved the problem I first encountered (which was probably an artifact of
something on the same line!).

>
> Your original assertion that
>
> """
> >>> chr(127)
> '?' (in fact a character like a house)
> """
>
> is quite correct, but I don't see why a weird printable representation makes
> a character unsuitable for use as a dictionary key. Maybe I missed your
> point. Anyway ...
>

I do not worry about the shape of the characters (why those of Khmer are much
more novel in any case;-)).

>
> >>> # Construct a string of all chars from 0 to 255
> >>> chars = "".join(map(chr, [i for i in range(256)]))
> >>> # Use decimal value to access single characters
> >>> # and use them as dictionary keys
> >>> dict = {}
> >>> dict[chars[233]] = "Two hundred thirty-three"
> >>> dict[chars[27]] = "escape"
> >>> dict["\033"]
> 'escape'
> >>> dict["\351"]
> 'Two hundred thirty-three'
> >>> dict
> {'\033': 'escape', '\351': 'Two hundred thirty-three'}
> >>>
>
> In other words, having constructed the chars[] list, you can index it with
> decimal numbers to get the characters you want. chars could equally have
> been a list of single-character strings, with the same effect.

Yes, it is the list of single-character strings that I am using (now
successfully).

>
>
> If this doesn't help you at all, please feel free to ignore my rantings.
>

Thank you for the questions and desire to help!

>
> regards
>  Steve
>
> > linesplit = split('\t',encodingline)
> >          if (len(linesplit) > 5):
> >             try:
> >                templist = linesplit[2:4]
> >                templist.append(split(';|:',linesplit[4]))
> >                templist.append(strip(linesplit[5]))
> >                encodedict[chr(int(linesplit[0]))] = templist
> >                print templist
> >             except ValueError:
> >                logerror('My error', linesplit[0])
> >          else:
> >             logerror('Not >5 fields long', linesplit)
> >

--
Maurice Bauhahn
2 Meadow Way
Dorney Reach
MAIDENHEAD
SL6 0DS
United Kingdom
Home Tel: +44(0)1628 626068
Work Tel: +44(0)1932 878404
Home Email: bauhahnm at clara.net
Work Email: mbauhahn at brio.com





More information about the Python-list mailing list