Need a specific sort of string modification. Can someone help?

Chris Angelico rosuav at gmail.com
Sat Jan 5 15:32:13 EST 2013


On Sun, Jan 6, 2013 at 7:04 AM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
> On Sat, Jan 5, 2013 at 8:57 AM, Chris Angelico <rosuav at gmail.com> wrote:
>> You miss my point, though. I went for simple Pythonic code, and never
>> measured its performance, on the expectation that it's "good enough".
>> Written in C, the state machine is probably WAY faster than splitting
>> and then iterating. My C++ MUD client uses code similar to that to
>> parse TELNET and ANSI codes from a stream of bytes in a socket (and
>> one of its "states" is that there's no more data available, so wait on
>> the socket); the rewrite in a high level language divides the string
>> on "\xFF" for TELNET and "\x1B" for ANSI, working them separately, and
>> then afterward splits on "\n" to divide into lines. The code's much
>> less convoluted, it's easier to test different parts (because I can
>> simply call the ANSI parser with a block of text), and on a modern
>> computer, you can't see the performance difference (since you spend
>> most of your time waiting for socket data anyway).
>
> Anecdotally and somewhat off-topic, when I wrote my own MUD client in
> Python, I implemented both TELNET and ANSI parsing in Python using a
> state machine processing one byte at a time (actually two state
> machines - one at the protocol layer and one at the client layer; the
> telnet module is a modified version of the twisted.conch.telnet
> module).  I was worried that the processing would be too slow in
> Python.  When I got it running, it turned out that there was a
> noticeable lag between input being received and displayed.  But when I
> profiled the issue it actually turned out that the rich text control I
> was using, which was written in C++, wasn't able to handle a large
> buffer gracefully.  The parsing code in Python did just fine and
> didn't contribute to the lag issue at all.

The opposite decision, the same result. Performance was *good enough*.
With Gypsum, my early performance problems were almost completely
solved by properly handling the expose-event and only painting the
parts that changed (I don't use a rich text control, I use a
GTK2.DrawingArea). Text processing of something that's come over a
network is seldom a problem; if it were, we'd see noticeably slower
downloads over HTTPS than HTTP, and that just doesn't happen. I think
(though I can't prove) that crypto written in C is probably more
expensive than ANSI parsing written in Python.

ChrisA



More information about the Python-list mailing list