Please help... with re

Olivier Dagenais olivierS.dagenaisP at canadaA.comM
Wed Jul 26 16:08:39 EDT 2000


First, something incredibly odd happened when I hit "Reply" in Outlook
Express.  Maybe you've got your mailer's MIME encoding set to the wrong
thing?

Anyway, when I see this problem, I am reminded of my "Compiler Construction"
course.  I would recommend something like so:

A - stream your input character by character
B - when you encounter a space, add all "buffered" characters to the list
C - if you encounter a quote, ignore rule B until you hit another quote
D - if you hit a backslash, ignore rule C for the next character
E - once you run out of characters, add all "buffered" characters to the
list

You can build a nice little Finite State Machine that should make this easy
and most likely faster (I'm guessing) than a regular expression.  This
assumes that you don't need to buffer very long words and that you have
enough memory to store the list representation, so at least as much as your
largest file.

A speedup tip I read somewhere talked about using string.join ( ) whenever
lots of concatenations were to take place, because it was a lot faster,
pre-allocating the new string size only once.

Sorry if I didn't really answer your question, but I hope I might have
helped anyway...

----------------------------------------------------------------------
Olivier A. Dagenais - Carleton University - Computer Science III


----- Original Message -----
From: "Gilles Lenfant" <glt at e-pack.net>
Newsgroups: comp.lang.python
Sent: Wednesday, July 26, 2000 17:33
Subject: Please help... with re


Hi,
I made an horrid 68 lines monster to split a string to a list of substrings
based on following example:

This is an "example of a \"splitted\" text " by my monster.

results to this list:

[ 'This' , 'is' , 'an' , 'example of a "splitted" text ' , 'by' , 'my' ,
'monster' ]

But the stuff is too slow to parse the lines of giant log files.
I would like to use "re" package to make a shorter and faster script but
understanding its patterns/methods is not in my poor brain capabilities.
I have burned my last neurons to try to do it, and I'm close to the edge of
a nervous breakdown.
Who can help me to get it at work ?

Many, many thanks in advance !

Gilles Lenfant
glt at equod.com






More information about the Python-list mailing list