"More About Unicode in Python 2 and 3"

Tim Chase python.list at tim.thechases.com
Mon Jan 6 06:49:28 EST 2014


On 2014-01-06 15:51, Chris Angelico wrote:
> >>> data = b"\x43\x6c\x67\x75\x62\x61" # is there an easier way to
> >>> turn a hex dump into a bytes literal?

Depends on how you source them:


# space separated:
>>> s1 = "43 6c 67 75 62 61"
>>> ''.join(chr(int(pair, 16)) for pair in s1.split())
'Clguba'

# all smooshed together:
>>> s2 = s1.replace(' ','')
>>> s2
'436c67756261'
>>> ''.join(chr(int(s2[i*2:(i+1)*2], 16)) for i in range(len(s2)/2))
'Clguba'

# as \xHH escaped:
>>> s3 = ''.join('\\x'+s2[i*2:(i+1)*2] for i in range(len(s2)/2))
>>> print(s3)
\x43\x6c\x67\x75\x62\x61
>>> print(b3)
b'\\x43\\x6c\\x67\\x75\\x62\\x61'
>>> b3.decode('unicode_escape')
'Clguba'

It might get more complex if you're not just dealing with bytes, or
if you have some other encoding scheme, but "s1" (space-separated, or
some other delimiter such as colon-separated that can be passed
to the .split() call) and "s2" (all smooshed together) are the two I
encounter most frequently.

-tkc








More information about the Python-list mailing list