tricky regular expressions

Xavier Morel xavier.morel at masklinn.net
Tue Feb 7 17:35:52 EST 2006


Ernesto wrote:
> Xavier Morel wrote:
>> Ernesto wrote:
>>> I'm not sure if I should use RE's or some other mechanism.  Thanks
>>>
>> I think a line-based state machine parser could be a better idea. Much
>> simpler to build and debug if not faster to execute.
> 
> What is a line-based state machine ?
> 
Parse your file line-by-line (since it seems that it's the way your data 
is organized).

Keep state informations somewhere.

Change your state based on the current state and the data being fed to 
your parser.

For example, here you basically have 3 states:

No Title, which is the initial state of the machine (it has not 
encountered any title yet, and you do stuff based on titles)

Title loaded, when you've met a title. "Title loaded" loops on itself: 
if you meet a "Title: whatever" line, you change the title currently 
stored  but you stay in the "Title loaded" state (you change the current 
state of the machine from "title loaded" to "title loaded").

Request loaded, which can be reached only when you're in the "Title 
loaded", and then encounter a line starting with "Request: ". When you 
reach that stage, do your processing (you have a title loaded, which is 
the latest title you encountered, and you have a request loaded, which 
is the request that immediately follows the loaded title), then you go 
back to the "No Title" state, since you've processed (and therefore 
unloaded) the current title.

So, the state diagram could kind of look like that:
(it's supposed to be a single state diagram, but i suck at ascii 
diagrams so i'll create one mini-diagram for each state)

NoTitle =0> TitleLoaded

=0>
Event: on encountering a line starting with "Title: "
Action: save the title (to whatever variable you see fit)
Change state to: TitleLoaded


TitleLoaded =1> TitleLoaded
     ||
     2
     \/
Request

=1>
Event: on encountering a line starting with "Title: "
Action: save the title (replace the current value of your title variable)
Change state to: TitleLoaded

=2>
Event: on encountering a line starting with "Request: "
Action: save the request?; immediately process the Request state
Change state to: Request


Request =3> NoTitle
   ||
   4
   \/
TitleLoaded

=3>
Event: the Request state is reached, the request is either "Play" or "Next"
Action: Do whatever you want to do; nuke the content of the title variable
Change state to: NoTitle

=4>
Event: the Request state is reached, the request is neither "Play" nor 
"Next"
Action: Nuke the content of the request variable (if you saved it), do 
nothing else
Change state to: TitleLoaded

As a final note, i'd recommend reading "Text Processing in Python", even 
though it puts a quite big emphasis on functional programming (which you 
may or may not appreciate), it's an extremely good initiation to 
text-files handling, parsing and processing.



More information about the Python-list mailing list