[Tutor] Re: handling string!!
Andrei
project5 at redrival.net
Thu Oct 23 16:53:54 EDT 2003
Daniel Ehrenberg wrote:
<snip>
> I have a somewhat related question. I am trying to
> write a program to parse the simple markup language
> used at Wikipedia.org. For this specific question, the
> markup is the same as in MoinMoin.
Kirk Bailey (who's around here too) has an open source Wiki at
tinylist.org, written in Python:
http://www.tinylist.org/cgi-bin/wikinehesaed2.py
It handles this kinda thing quite well, I just tested it at the bottom of
http://www.tinylist.org/cgi-bin/wikinehesaed2.py/SandBox. Perhaps you
should look at its code.
> '''bold''' -> <strong>bold</strong>
> ''italics'' -> <em>italics</em>
> '''''bold and italics''''' -> <strong><em>bold and
> italics</em></strong>
> '''''b & i'' b''' -> <strong><em>b & i</em> b</strong>
> '''''b & i''' i'' -> <em><strong>b & i</strong> i</em>
<snip>
> would parse the bold parts of the text. It would be
> similar for the code processing italics and the
> combination of bold and italics, doing the ones with
> the most apostrophies first and the least apostrophies
> last (ie. first bold and italics, then bold, then
> italics). However, I don't see how I could do the same
> with the forth and fifth examples. Could you help me
> with that?
You just have to keep track of what you have open and apply the first open,
last to close principle (use a list to which you append tags when you open
them and then delete them when you close them starting from the last). In
your 5th example:
> '''''b & i''' i'' -> <em><strong>b & i</strong> i</em>
your parser would e.g. first hit ''' (open <strong> and append it to the
OpenTags list), then the '' (open <em> and append it to the OpenTags list).
When it finds the closing ''', it tries to close the <strong>, but it
notices in the OpenTags list that there are tags before it. It closes those
first (in this case, the last tag in OpenTags is <em>, so it closes it
first, but places it in a different list, say RestoreTags), then it closes
the <strong> and reopens the ones in RestoreTags - obviously, these end up
being on the OpenTags list again. The generated code is then:
<strong><em>b & i</em></strong><em> i</em>
Which is not perfect, but it's valid XHTML :). Making it really intelligent
would be quite a bit harder, especially if you consider you might be
nesting more tags.
I'm not sure this is the way Kirk's Wiki does it, but I know it would work
because I use this same principle in my regular expression tool to
highlight parentheses.
I'm wondering how you'd handle '''''' though (can be two bolds or three
italics).
--
Yours,
Andrei
=====
Mail address in header catches spam. Real contact info (decode with rot13):
cebwrpg5 at bcrenznvy.pbz. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V
ernq gur yvfg, fb gurer'f ab arrq gb PP.
More information about the Tutor
mailing list