A 'raw' codec for binary "strings" in Python?
Francis Avila
francisgavila at yahoo.com
Wed Mar 3 16:05:07 EST 2004
In <mailman.9.1078199710.12614.python-list at python.org> Bill Janssen
wrote:
>> You could use
>> "\xc0".decode("iso-8859-1").encode('US-ASCII', 'replace')
>
> Yes, this is what I'm doing at the moment. But it seems a real hack.
> The string *isn't* in Latin-1; it's binary, it's data, and there
> should be a way of saying that. Maybe a third kind of string type?
>
> Bill
>
The "raw binary" datatype is the str object. The "text" datatype is the
unicode object. Yes, I know that str is more often than not also used
for ascii text, but this is historical and ideally should go away.
Probably Python 3k will replace str with unicode (possibly calling it
"string"), and grow a new datatype for raw binary stuff, with
appropriate methods. But no one has really given enough thought to this
yet.
You can sorta get this now, by running python with the -U flag:
destaco:~ favila$ python -U
Python 2.3 (#1, Sep 13 2003, 00:49:11)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> type('')
<type 'unicode'>
>>>
It breaks all over the place, though.
--
Francis Avila
More information about the Python-list
mailing list