Need a specific sort of string modification. Can someone help?

Sat Jan 5 15:47:03 EST 2013

In article <mailman.142.1357417943.2939.python-list at python.org>,
 Chris Angelico <rosuav at gmail.com> wrote:

> On Sun, Jan 6, 2013 at 7:04 AM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
> > On Sat, Jan 5, 2013 at 8:57 AM, Chris Angelico <rosuav at gmail.com> wrote:
> >> You miss my point, though. I went for simple Pythonic code, and never
> >> measured its performance, on the expectation that it's "good enough".
> >> Written in C, the state machine is probably WAY faster than splitting
> >> and then iterating. My C++ MUD client uses code similar to that to
> >> parse TELNET and ANSI codes from a stream of bytes in a socket (and
> >> one of its "states" is that there's no more data available, so wait on
> >> the socket); the rewrite in a high level language divides the string
> >> on "\xFF" for TELNET and "\x1B" for ANSI, working them separately, and
> >> then afterward splits on "\n" to divide into lines. The code's much
> >> less convoluted, it's easier to test different parts (because I can
> >> simply call the ANSI parser with a block of text), and on a modern
> >> computer, you can't see the performance difference (since you spend
> >> most of your time waiting for socket data anyway).
> >
> > Anecdotally and somewhat off-topic, when I wrote my own MUD client in
> > Python, I implemented both TELNET and ANSI parsing in Python using a
> > state machine processing one byte at a time (actually two state
> > machines - one at the protocol layer and one at the client layer; the
> > telnet module is a modified version of the twisted.conch.telnet
> > module).  I was worried that the processing would be too slow in
> > Python.  When I got it running, it turned out that there was a
> > noticeable lag between input being received and displayed.  But when I
> > profiled the issue it actually turned out that the rich text control I
> > was using, which was written in C++, wasn't able to handle a large
> > buffer gracefully.  The parsing code in Python did just fine and
> > didn't contribute to the lag issue at all.
> 
> The opposite decision, the same result. Performance was *good enough*.
> With Gypsum, my early performance problems were almost completely
> solved by properly handling the expose-event and only painting the
> parts that changed (I don't use a rich text control, I use a
> GTK2.DrawingArea). Text processing of something that's come over a
> network is seldom a problem; if it were, we'd see noticeably slower
> downloads over HTTPS than HTTP, and that just doesn't happen. I think
> (though I can't prove) that crypto written in C is probably more
> expensive than ANSI parsing written in Python.
> 
> ChrisA

It's rare to find applications these days that are truly CPU bound.  
Once you've used some reasonable algorithm, i.e. not done anything in 
O(n^2) that could have been done in O(n) or O(n log n), you will more 
often run up against I/O speed, database speed, network latency, memory 
exhaustion, or some such as the reason your code is too slow.

The upshot of this is that for most things, Python (even though it runs 
an order of magnitude slower than C), will always be *good enough*.

I'm sure I've mentioned this before, but the application code for 
Songza.com is 100% Python.  We pretty much can't even measure how much 
CPU time is spent running Python code.  Everything is database, network, 
and (occasionally) disk throughput.