Jython: Upper-ASCII characters '\351' from chr(233)

Tue Apr 24 20:23:07 EDT 2001

Thank you very much for your persistent help.

I was able to get the 8th bit characters to act as keys...with a somewhat
complex construction: chr(int(linesplit[0])). Linesplit had decimal
numbers in text format.

linesplit = split('\t',encodingline)
         if (len(linesplit) > 5):
            try:
               templist = linesplit[2:4]
               templist.append(split(';|:',linesplit[4]))
               templist.append(strip(linesplit[5]))
               encodedict[chr(int(linesplit[0]))] = templist
               print templist
            except ValueError:
               logerror('My error', linesplit[0])
         else:
            logerror('Not >5 fields long', linesplit)

D-Man wrote:

> On Fri, Apr 20, 2001 at 09:45:33PM +0100, Maurice Bauhahn wrote:
> | Thank you for the suggestion, D-Man.
> |
> | However, I doubt that this is a problem with the display, because I
> | can see all these unusual characters when I print a line of text to
> | the screen. The problem becomes obvious when I try one of those
> | upper ASCII characters as a key of the dictionary...it does not
> | work. My hope is to compare each character from a text file...and
>
> How do you know it doesn't work?  I have heard that all strings in
> Jython are Unicode because all Java strings are Unicode (or something
> like that).
>
> Say...I just tried it again, using Jython 2.0 and CPython 2.1.  If I
> type
>
> print chr( 233 )
>
> I get an accented e in CPython and something else from Jython, but not
> the '\351' from before.  Actually in CPython I get '\xe9' if I just
> call chr.  It might be a difference between str() and repr().
>
> If you can enter the character into your file, putting a 'u' in front
> of the string specifies it as unicode.  Ex :
>
> print u'é'
>
> Say, what if you use the 'unichr' function?  There might be a
> difference between chr and unichr (in CPython there is).
>
> Here is a snippet, CPython first, then Jython :
>
> >>> unichr( 8218 )
> u'\u201a'
> >>> print unichr( 8218 )
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> >>> ord( 'é' )
> 8218
> >>> unichr( 233 )
> '\351'
> >>> unichr( 8218 )
> u'\u201A'
> >>> print unichr( 8218 )
> é
> >>> print chr( 8218 )
> é
>
> | use the dictionary to assist in translation of those characters to
> | Unicode (the Cambodian script...so standard Java code converters are
> | not useful).
> |
> | Maybe I will have to call a Java function to accomplish my desired
> | task, right?
>
> Maybe.  I really don't have much experience with using Unicode or
> locale specific stuff.
>
> I hope my results give you some thoughts on how to solve your problem.
> -D

--
Maurice Bauhahn
2 Meadow Way
Dorney Reach
MAIDENHEAD
SL6 0DS
United Kingdom
Home Tel: +44(0)1628 626068
Work Tel: +44(0)1932 878404
Home Email: bauhahnm at clara.net
Work Email: mbauhahn at brio.com