Printing characters outside of the ASCII range

danielk danielkleinad at gmail.com
Sun Nov 11 08:42:35 EST 2012


On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
> On Fri, Nov 9, 2012 at 2:46 PM, danielk <danielkleinad at gmail.com> wrote:
> 
> > D:\home\python>pytest.py
> 
> > Traceback (most recent call last):
> 
> >   File "D:\home\python\pytest.py", line 1, in <module>
> 
> >     print(chr(253).decode('latin1'))
> 
> > AttributeError: 'str' object has no attribute 'decode'
> 
> >
> 
> > Do I need to import something?
> 
> 
> 
> Ramit should have written "encode", not "decode".  But the above still
> 
> would not work, because chr(253) gives you the character at *Unicode*
> 
> code point 253, not the character with CP437 ordinal 253 that your
> 
> terminal can actually print.  The Unicode equivalents of those
> 
> characters are:
> 
> 
> 
> >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))
> 
> [8319, 178, 9632]
> 
> 
> 
> So these are what you would need to encode to CP437 for printing.
> 
> 
> 
> >>> print(chr(8319))
> 
>> 
> >>> print(chr(178))
> 
> ²
> 
> >>> print(chr(9632))
> 
>> 
> 
> 
> That's probably not the way you want to go about printing them,
> 
> though, unless you mean to be inserting them manually.  Is the data
> 
> you get from your database a string, or a bytes object?  If the
> 
> former, just do:
> 
> 
> 
> print(data.encode('cp437'))
> 
> 
> 
> If the latter, then it should be printable as is, unless it is in some
> 
> other encoding than CP437.

Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.

class Pytest(str):
    def __init__(self, data = None):
        if data == None: data = ""
        self.data = data

    def __repr__(self):
        return (self.data).encode('cp437')

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
abc²def
>>> print(p.data)
abc²def
>>> print(type(p.data))
<class 'str'>

If I change '__repr__' to '__str__' then I get:

>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)

Why is '__str__' behaving differently than '__repr__' ? I'd like to be able to use '__str__' because the result is not executable code, it's just a string of the record contents.

The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said it was <class 'str'>, which I'm taking to be 'type string', or can a 'string' also be 'a string of bytes' ?

I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :-)

My goals are:

a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.



More information about the Python-list mailing list