A 'raw' codec for binary "strings" in Python?

Francis Avila francisgavila at yahoo.com
Wed Mar 3 16:05:07 EST 2004


In <mailman.9.1078199710.12614.python-list at python.org> Bill Janssen  
wrote:
>> You could use
>>     "\xc0".decode("iso-8859-1").encode('US-ASCII', 'replace')
> 
> Yes, this is what I'm doing at the moment.  But it seems a real hack.
> The string *isn't* in Latin-1; it's binary, it's data, and there
> should be a way of saying that.  Maybe a third kind of string type?
> 
> Bill
> 

The "raw binary" datatype is the str object.  The "text" datatype is the 
unicode object.  Yes, I know that str is more often than not also used 
for ascii text, but this is historical and ideally should go away.

Probably Python 3k will replace str with unicode (possibly calling it 
"string"), and grow a new datatype for raw binary stuff, with 
appropriate methods.  But no one has really given enough thought to this 
yet.

You can sorta get this now, by running python with the -U flag:

destaco:~ favila$ python -U
Python 2.3 (#1, Sep 13 2003, 00:49:11) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> type('')
<type 'unicode'>
>>> 

It breaks all over the place, though.
-- 
Francis Avila 



More information about the Python-list mailing list