Parsing a serial stream too slowly

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Jan 24 00:08:05 EST 2012


On Tue, 24 Jan 2012 10:49:41 +1100, Cameron Simpson wrote:

> | def OnSerialRead(self, event):
> | 	text = event.data
> | 	self.sensorabuffer = self.sensorabuffer + text 
> | 	self.sensorbbuffer = self.sensorbbuffer + text 
> | 	self.sensorcbuffer = self.sensorcbuffer + text
> 
> Slow and memory wasteful. Supposing a sensor never reports? You will
> accumulate an ever growing buffer string. And extending a string gets
> expensive as it grows.

I admit I haven't read this entire thread, but one thing jumps out at me. 
It looks like the code is accumulating strings by repeated + 
concatenation. This is risky.

In general, you should accumulate strings into a list buffer, then join 
them into a single string in one call:

buffer = []
while something:
    buffer.append(text)
return ''.join(buffer)


Use of repeated string addition risks slow quadratic behaviour. The OP is 
reporting slow behaviour... alarms bells ring.

For anyone who doesn't understand what I mean about slow quadratic 
behaviour, read this: 

http://www.joelonsoftware.com/articles/fog0000000319.html

Recent versions of CPython includes an optimization which *sometimes* can 
avoid this poor performance, but it can be defeated easily, and does not 
apply to Jython and IronPython, so it is best to not rely on it.

I don't know whether this is the cause of the OP's slow behaviour, but it 
is worth investigating. Especially since it is likely to not just be 
slow, but SLLLLLOOOOOOWWWWWWWWW -- a bad quadratic algorithm can be tens 
of thousands or millions of times slower than it need be.



[...]
> The slow: You're compiling the regular expression _every_ time you come
> here (unless the re module caches things, which I seem to recall it may.

It does.


> But that efficiency is only luck.

More deliberate design than good luck :)

Nevertheless, good design would have you compile the regex once, and not 
rely on the re module's cache.


[...]
> Regex _is_ slow. It is good for flexible lexing, but generally Not Fast.

I hope I will never be mistaken for a re fanboy, but credit where credit 
is due: if you need the full power of a regex, you almost certainly can't 
write anything in Python that will beat the re module. 

However, where regexes become a trap is that often people use them for 
things which are best coded as simple Python tests that are much faster, 
such as using a regex where a simple str.startswith() would do.


-- 
Steven



More information about the Python-list mailing list