RE - non-greedy - some greediness - complete greediness
Michele Simionato
mis6 at pitt.edu
Sun Nov 10 13:36:03 EST 2002
Doru-Catalin Togea <doru-cat at ifi.uio.no> wrote in message news:<mailman.1036918692.17309.python-list at python.org>...
> Hi!
>
> I am working on a little project where i need REs with a self-defined
> level of greediness.
>
> I am processing text in a Latex like manner, for those of you who are
> familiar with it.
>
> I want to be able to recognize Latex-like commands within my text. A
> command starts with a character, say '\' followed by the command's name
> and followed by the text on which the command applies enclosed in curly
> brackets. Examples:
>
> \footnote{some text}
> \cite{some referance}
> \section{some title}
>
> Recognizing such patterns and retriving the name of the command and the
> text on which the command applies is not so difficult to achieve. Things
> get complicated though when one encounters nested commands.
>
> Say I have the following string:
>
> "abcd \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}"
> ^ ^ ^
> closing bracket of nested \cite | | |
> closing bracket of first \footnote | |
> closing bracket of second \footnote |
>
> By using non-greedy REs, I would get recognized the following footnotes:
> 1) \footnote{efgh \cite{myRef1}
> 2) \footnote{hello there}
>
> The first matching is wrong because the first '}' encountered belongs to
> the nested \cite command. The second matching is correct.
>
> By using greedy REs, I would get recognized the following pattern:
> 1) \footnote{efgh \cite{myRef1} ijkl} mnop \footnote{hello there}
>
> This is wrong because a) there are two \footnote commands in my string, so
> I should get two matchings, and b) the first \footnote command should be
> applied only to the "efgh \cite{myRef1} ijkl" substring.
>
> In other words I need to be able to specify the level of greediness my REs
> should have. Is it possible? If yes, how?
>
> I have an idea of how to solve my problem without REs, by counting the
> opening and closing curly brackets and reasoning algorithmically about
> where within my text each nested command begins and ends. The question is
> can the above stated problem be solved elegantly by means of REs?
>
> With best regards,
> Catalin
>
>
> <<<< ================================== >>>>
> << We are what we repeatedly do. >>
> << Excellence, therefore, is not an act >>
> << but a habit. >>
> <<<< ================================== >>>>
look at
http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&threadm=2259b0e2.0211060659.30631e4f%40posting.google.com&prev=/groups%3Fdq%3D%26num%3D25%26hl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dcomp.lang.python%26start%3D150
More information about the Python-list
mailing list