Most pythonic way to implement byte stuffing algorithm

Travis Griggs travisgriggs at gmail.com
Tue Apr 17 12:02:52 EDT 2018


I posted this on SO, but… yeah… 

I'm doing some serial protocol stuff and want to implement a basic byte stuffing algorithm in python. Though really what this really generalizes to is “what is the most pythonic way to transform one sequence of bytes where some bytes are passed through 1:1, but others are transformed to longer subsequences of bytes?” I’m pretty sure this rules out the use of transform() which expects a 1:1 mapping.


So far, I've come with 5 different approaches, and each of them has something I don't like about it:

1 Via Generator

def stuff1(bits):
    for byte in bits:
        if byte in _EscapeCodes:
            yield PacketCode.Escape
            yield byte ^ 0xFF
        else:
            yield byte

This may be my favorite, but maybe just because I'm kind of fascinated by yield based generators. I worried that the generator would make it slow, but it's actually the second fastest of the bunch.


2 Simply bytes()

def stuff2(bits):
    result = bytes()
    for byte in bits:
        if byte in _EscapeCodes:
            result += bytes([PacketCode.Escape, byte ^ 0xFF])
        else:
            result += bytes([byte])
        return result

Constantly has to create single element arrays just to throw them out because I'm not aware of any "copy with one additional element" operation. It ties for the slowest of the bunch.


3 Use bytearray()

def stuff3(bits):
    result = bytearray()
    for byte in bits:
        if byte in _EscapeCodes:
            result.append(PacketCode.Escape)
            result.append(byte ^ 0xFF)
        else:
            result.append(byte)
    return result

Seems better than the direct bytes() approach. Actually slower than the yield generator and can do one byte at a time (instead of needing intermediate 1 element collections). But it feels brutish. It's middle of the pack performance.


4 BytesIO()

def stuff4(bits):
    bio = BytesIO()
    for byte in bits:
        if byte in _EscapeCodes:
            bio.write(bytes([PacketCode.Escape, byte ^ 0xFF]))
        else:
            bio.write(bytes([byte]))
    return bio.getbuffer()

I like the stream based approach here. But it is annoying that there doesn't seem to be something like a write1() API that could just add 1 byte, so I have to make those intermediate bytes again. If there was a "write single byte", I'd like this one. It ties for slowest.


5 Use replace()

def stuff5(bits):
    escapeStuffed = bytes(bits).replace(bytes([PacketCode.Escape]), bytes([PacketCode.Escape, PacketCode.Escape ^ 0xFF]))
    stopStuffed= escapeStuffed.replace(bytes([PacketCode.Stop]), bytes([PacketCode.Escape, PacketCode.Stop ^ 0xFF]))
    return stopStuffed.replace(bytes([PacketCode.Start]), bytes([PacketCode.Escape, PacketCode.Start ^ 0xFF]))

This is the fastest. But I don't like the way the code reads and the intermediate sweeps.



More information about the Python-list mailing list