A 'raw' codec for binary "strings" in Python?
Michael Hudson
mwh at python.net
Tue Mar 2 06:46:29 EST 2004
Bill Janssen <janssen at parc.com> writes:
> I've encountered an issue dealing with strings read from files. I
> read a line from a file, then try to print it out as an ASCII string:
>
> line = fp.readline()
> print line.encode('US-ASCII', 'replace')
>
> and of course I get an error like:
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd5 in position 1: ordinal not in range(128)
>
> because the file contained some binary character. You'll notice that
> the problem is in *decoding* the string, not in re-encoding it,
> because I'm using the default "C" locale, and "US-ASCII" is presumed
> for strings.
Actually, the "C" locale has precisely nothing to do with it.
> But these strings are *not* US-ASCII, they are raw bytes. How do I
> format a string of raw bytes for conversion to a recognized charset
> encoding for printing?
You don't?
Wouldn't
def m(c):
if c in string.printable:
return c
else:
return '?'
t = ''.join([m(chr(o)) for o in range(m)])
line.translate(t)
make more sense?
Cheers,
mwh
--
I like silliness in a MP skit, but not in my APIs. :-)
-- Guido van Rossum, python-dev
More information about the Python-list
mailing list