need help understanding: converting text to binary

Eli the Bearded * at eli.users.panix.com
Mon Apr 22 20:54:24 EDT 2019


Here's some code I wrote today:

------ cut here 8< ------
HEXCHARS = (b'0', b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8', b'9',
            b'A', b'B', b'C', b'D', b'E', b'F',
            b'a', b'b', b'c', b'd', b'e', b'f')


# decode a single hex digit
def hord(c):
    c = ord(c)
    if c >= ord(b'a'):
        return c - ord(b'a') + 10
    elif c >= ord(b'A'):
        return c - ord(b'a') + 10
    else:
        return c - ord(b'0')


# decode quoted printable, specifically the MIME-encoded words
# variant which is slightly different than the body text variant
def decodeqp(v):
    out = b''
    state = ''             # used for =XY decoding
    for c in list(bytes(v,'ascii')):
        c = bytes((c,))

        if c == b'=':
            if state == '':
                state = '='
            else:
                raise ValueError
            continue

       if c == b'_':       # underscore is space only for MIME words
            if state == '':
                out += b' '
            else:
                raise ValueError
            continue

        if c in HEXCHARS:
            if state == '':
                out += c
            elif state == '=':
                state = hord(c)
            else:
                state *= 16
                state += hord(c)
                out += bytes((state,))
                state = ''
            continue

        if state == '':
            out += c
        else:
            raise ValueError
        continue

    if state != '':
        raise ValueError

    return out
------ >8 cut here ------

It works, in the sense that

     print(decodeqp("=21_yes"))

will output

     b'! yes'

But the bytes() thing is really confusing me. Most of this is translated
from C code I wrote some time ago. I'm new to python and did spend some
time reading:

https://docs.python.org/3/library/stdtypes.html#bytes-objects

Why does "bytes((integertype,))" work? I'll freely admit to stealing
that trick from /usr/lib/python3.5/quopri.py on my system. (Why am I not
using quopri? Well, (a) I want to learn, (b) it decodes to a file
not a variable, (c) I want different error handling.)

Is there a more python-esque way to convert what should be plain ascii
into a binary "bytes" object? In the use case I'm working towards the
charset will not be ascii or UTF-8 all of the time, and the charset
isn't the responsibility of the python code. Think "decode this if
charset matches user-specified value, then output in that same charset;
otherwise do nothing."

Elijah
------
has yet to warm up to this language



More information about the Python-list mailing list