RE - non-greedy - some greediness - complete greediness

Michele Simionato mis6 at pitt.edu
Sun Nov 10 13:36:03 EST 2002


Doru-Catalin Togea <doru-cat at ifi.uio.no> wrote in message news:<mailman.1036918692.17309.python-list at python.org>...
> Hi!
> 
> I am working on a little project where i need REs with a self-defined
> level of greediness.
> 
> I am processing text in a Latex like manner, for those of you who are
> familiar with it.
> 
> I want to be able to recognize Latex-like commands within my text. A
> command starts with a character, say '\' followed by the command's name
> and followed by the text on which the command applies enclosed in curly
> brackets. Examples:
> 
> 	\footnote{some text}
> 	\cite{some referance}
> 	\section{some title}
> 
> Recognizing such patterns and retriving the name of the command and the
> text on which the command applies is not so difficult to achieve. Things
> get complicated though when one encounters nested commands.
> 
> Say I have the following string:
> 
>  "abcd \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}"
>                                   ^     ^                           ^
>  closing bracket of nested \cite  |     |                           |
>     closing bracket of first \footnote  |                           |
>                                closing bracket of second \footnote  |
> 
> By using non-greedy REs, I would get recognized the following footnotes:
> 	1) \footnote{efgh \cite{myRef1}
> 	2) \footnote{hello there}
> 
> The first matching is wrong because the first '}' encountered belongs to
> the nested \cite command. The second matching is correct.
> 
> By using greedy REs, I would get recognized the following pattern:
> 	1) \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}
> 
> This is wrong because a) there are two \footnote commands in my string, so
> I should get two matchings, and b) the first \footnote command should be
> applied only to the "efgh \cite{myRef1} ijkl" substring.
> 
> In other words I need to be able to specify the level of greediness my REs
> should have. Is it possible? If yes, how?
> 
> I have an idea of how to solve my problem without REs, by counting the
> opening and closing curly brackets and reasoning algorithmically about
> where within my text each nested command begins and ends. The question is
> can the above stated problem be solved elegantly by means of REs?
> 
> With best regards,
> Catalin
> 
> 
> 	<<<< ================================== >>>>
> 	<<     We are what we repeatedly do.      >>
> 	<<  Excellence, therefore, is not an act  >>
> 	<<             but a habit.               >>
> 	<<<< ================================== >>>>

look at

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&threadm=2259b0e2.0211060659.30631e4f%40posting.google.com&prev=/groups%3Fdq%3D%26num%3D25%26hl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.lang.python%26start%3D150



More information about the Python-list mailing list