Working with binary data, S-records (long)

Thu Mar 20 13:58:30 EST 2003

We have a program to download (Flash-)Eprom images to an embedded device.
It reads the image data from a file in Motorola S-record format, augments
the image with some device specific data and sends the thing via a serial
port to the device, usually after downloading a small piece of software,
a second-stage bootloader, first, in the same manner. This program is
currently written in C and runs under Linux and in a DOS shell under older
Windows versions. Over the years said program accumulated a lot of cruft
and grew command line options like weed. Now my boss asked me to make a
new version which offers a GUI and runs under Windows 2000, XP as well.
Which is perfectly fine with me, 'cause after finding pyserial that'll be
more like a fun job (I've already written a similar thing in Python that
reads an ELF program file and downloads the needed section to another
device - it worked in a few hours; a C version would have taken weeks to
write and debug).

Time to get to the topic. Reading S-records in Python is not all that much
fun (ok, it's neither in C). I've thought about doing it in C and
returning a string, but that would lose the address information. And
creating more complex python data types in C is something I've never done.
And I don't want to compile under Windows. Thus I wrote a pure Python
reader, which looks like (this is the whole class so far):
--------------------------
import operator

class SRecord:
    def __init__(self, init=0xff, checkcs=True):
        self.udata  = []
        self.data   = []
        self.tail   = {}
        self.offset = 0
        self.size   = 0
        self.start  = None
        self.comm   = []
        self.init   = init
        self.check  = checkcs

    def readrecord(self, line):
        """Lese eine Zeile als S-Record und gebe Adresse, Daten und Prüfsumme zurück."""
        type = line[:2]
        data = [int(line[i:i + 2], 16) for i in range(2, len(line), 2)]
        cs   = (reduce(operator.add, data) + 1) & 0xff  # Muß 0 ergeben
        if type in ('S1', 'S9'):
            adr = (data[1] << 8) + data[2]
            fd  = 3
        elif type in ('S2', 'S8'):
            adr = (data[1] << 16) + (data[2] << 8) + data[3]
            fd  = 4
        elif type in ('S3', 'S7'):
            adr = (long(data[0]) << 24) + (data[2] << 16) + (data[3] << 8) + data[4]
            fd  = 5
        elif type == 'S0':      # Kommentar
            return 'C', 0, data[3:-1], cs
        else:
            raise ValueError, "Kein gültiger S-Record"
        if type > 'S6':         # Startadresse
            type = 'S'
        else:                   # Daten
            type = 'D'
        return type, adr, data[fd:-1], cs

    def readrecords(self, records):
        """Eine Liste (Zeilen) von S-Records lesen."""
        recno = -1
        for line in records:
            recno += 1
            line = line.rstrip()
            type, adr, data, cs = self.readrecord(line)
            if cs and self.checkcs:
                raise ValueError, "Prüfsummenfehler in Record %d" % recno
            if type == 'D':
                self.udata.append((adr, data))
            elif type == 'S':
                self.start = adr
            else:
                self.comm.append("".join(map(chr, data)))
        if not self.udata:
            return
        self.udata.sort()
        loadr = self.udata[0][0]
        hiadr = self.udata[-1][0] + len(self.udata[-1][1])
        size  = hiadr - loadr
        self.data = [self.init] * size
        for adr, data in self.udata:
            dlen  = len(data)
            adr  -= loadr
            self.data[adr:adr + dlen] = data
        self.offset = loadr
        self.size   = size

-----------------------
On my development machine (1.7 GHz) it's reasonably fast with a file worth
100 KB. But I'm afraid it'll suck on our production machines, which run at
166 MHz (give or take some). I thought about using array, but it's lacking
a method to create a big array without creating a list or string first.

Anyway, does anyone see a way to speed this up? I'm not going to inline
readrecord(), as I don't care about 10 %. I'm asking if you see a real
flaw in my algorithm.

<pipe-dreaming mode on>
Whenever I play with binary data in Python, a dream of a mutable string
data type crops up. Doing byte fiddling with strings is quite ok as long
as the data is comparably small. But when the thing gets largish, the
slicing, copying and reassembling are getting increasingly inelegant, not
to say "un-pythonic." Even if that hypothetical mutable string type
wouldn't be returned by read() and wouldn't be accepted by write(),
conversion from and to normal immutable strings should be cheap.
<pipe-dreaming mode off>

Hope you're not distracted by the german comments, and just presuming you
know what S-records are,

Hans-Joachim