Printing characters outside of the ASCII range
danielk
danielkleinad at gmail.com
Sun Nov 11 08:42:35 EST 2012
On Friday, November 9, 2012 5:11:12 PM UTC-5, Ian wrote:
> On Fri, Nov 9, 2012 at 2:46 PM, danielk <danielkleinad at gmail.com> wrote:
>
> > D:\home\python>pytest.py
>
> > Traceback (most recent call last):
>
> > File "D:\home\python\pytest.py", line 1, in <module>
>
> > print(chr(253).decode('latin1'))
>
> > AttributeError: 'str' object has no attribute 'decode'
>
> >
>
> > Do I need to import something?
>
>
>
> Ramit should have written "encode", not "decode". But the above still
>
> would not work, because chr(253) gives you the character at *Unicode*
>
> code point 253, not the character with CP437 ordinal 253 that your
>
> terminal can actually print. The Unicode equivalents of those
>
> characters are:
>
>
>
> >>> list(map(ord, bytes([252, 253, 254]).decode('cp437')))
>
> [8319, 178, 9632]
>
>
>
> So these are what you would need to encode to CP437 for printing.
>
>
>
> >>> print(chr(8319))
>
> ⁿ
>
> >>> print(chr(178))
>
> ²
>
> >>> print(chr(9632))
>
> ■
>
>
>
> That's probably not the way you want to go about printing them,
>
> though, unless you mean to be inserting them manually. Is the data
>
> you get from your database a string, or a bytes object? If the
>
> former, just do:
>
>
>
> print(data.encode('cp437'))
>
>
>
> If the latter, then it should be printable as is, unless it is in some
>
> other encoding than CP437.
Ian's solution gives me what I need (thanks Ian!). But I notice a difference between '__str__' and '__repr__'.
class Pytest(str):
def __init__(self, data = None):
if data == None: data = ""
self.data = data
def __repr__(self):
return (self.data).encode('cp437')
>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
abc²def
>>> print(p.data)
abc²def
>>> print(type(p.data))
<class 'str'>
If I change '__repr__' to '__str__' then I get:
>>> import pytest
>>> p = pytest.Pytest("abc" + chr(178) + "def")
>>> print(p)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type bytes)
Why is '__str__' behaving differently than '__repr__' ? I'd like to be able to use '__str__' because the result is not executable code, it's just a string of the record contents.
The documentation for the 'encode' method says: "Return an encoded version of the string as a bytes object." Yet when I displayed the type, it said it was <class 'str'>, which I'm taking to be 'type string', or can a 'string' also be 'a string of bytes' ?
I'm trying to get my head around all this codecs/unicode stuff. I haven't had to deal with it until now but I'm determined to not let it get the best of me :-)
My goals are:
a) display a 'raw' database record with the delimiters intact, and
b) allow the client to create a string that represents a database record. So, if they know the record format then they should be able to create a database object like it does above, but with the chr(25x) characters. I will handle the conversion of the chr(25x) characters internally.
More information about the Python-list
mailing list