A 'raw' codec for binary "strings" in Python?

Michael Hudson mwh at python.net
Tue Mar 2 06:46:29 EST 2004


Bill Janssen <janssen at parc.com> writes:

> I've encountered an issue dealing with strings read from files.  I
> read a line from a file, then try to print it out as an ASCII string:
> 
>   line = fp.readline()
>   print line.encode('US-ASCII', 'replace')
> 
> and of course I get an error like:
> 
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in ?
>   UnicodeDecodeError: 'ascii' codec can't decode byte 0xd5 in position 1: ordinal not in range(128)
> 
> because the file contained some binary character.  You'll notice that
> the problem is in *decoding* the string, not in re-encoding it,
> because I'm using the default "C" locale, and "US-ASCII" is presumed
> for strings.

Actually, the "C" locale has precisely nothing to do with it.

> But these strings are *not* US-ASCII, they are raw bytes.  How do I
> format a string of raw bytes for conversion to a recognized charset
> encoding for printing?

You don't?

Wouldn't 

def m(c):
    if c in string.printable:
        return c
    else:
        return '?'

t = ''.join([m(chr(o)) for o in range(m)])

line.translate(t)

make more sense?

Cheers,
mwh

-- 
  I like silliness in a MP skit, but not in my APIs. :-)
                                       -- Guido van Rossum, python-dev



More information about the Python-list mailing list