A 'raw' codec for binary "strings" in Python?

Bill Janssen janssen at parc.com
Mon Mar 1 16:05:31 EST 2004


I've encountered an issue dealing with strings read from files.  I
read a line from a file, then try to print it out as an ASCII string:

  line = fp.readline()
  print line.encode('US-ASCII', 'replace')

and of course I get an error like:

  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xd5 in position 1: ordinal not in range(128)

because the file contained some binary character.  You'll notice that
the problem is in *decoding* the string, not in re-encoding it,
because I'm using the default "C" locale, and "US-ASCII" is presumed
for strings.  But these strings are *not* US-ASCII, they are raw
bytes.  How do I format a string of raw bytes for conversion to a
recognized charset encoding for printing?

There seems to be no 'raw' codec that would capture this.  There's no
way of setting an attribute on a file to express this.  It looks like
the best I can do is

print string.join([(((ord(x) > 0 and ord(x) < 0x7F) and x) or (r"\x%02x" % ord(x))) for x in line], '')

which seems extremely inefficient.

Bill




More information about the Python-list mailing list