[Tutor] Re: handling string!!

Andrei project5 at redrival.net
Thu Oct 23 16:53:54 EDT 2003


Daniel Ehrenberg wrote:

<snip>
> I have a somewhat related question. I am trying to
> write a program to parse the simple markup language
> used at Wikipedia.org. For this specific question, the
> markup is the same as in MoinMoin.

Kirk Bailey (who's around here too) has an open source Wiki at 
tinylist.org, written in Python:

http://www.tinylist.org/cgi-bin/wikinehesaed2.py

It handles this kinda thing quite well, I just tested it at the bottom of 
http://www.tinylist.org/cgi-bin/wikinehesaed2.py/SandBox. Perhaps you 
should look at its code.

> '''bold''' -> <strong>bold</strong>
> ''italics'' -> <em>italics</em>
> '''''bold and italics''''' -> <strong><em>bold and
> italics</em></strong>
> '''''b & i'' b''' -> <strong><em>b & i</em> b</strong>
> '''''b & i''' i'' -> <em><strong>b & i</strong> i</em>

<snip>

> would parse the bold parts of the text. It would be
> similar for the code processing italics and the
> combination of bold and italics, doing the ones with
> the most apostrophies first and the least apostrophies
> last (ie. first bold and italics, then bold, then
> italics). However, I don't see how I could do the same
> with the forth and fifth examples. Could you help me
> with that?

You just have to keep track of what you have open and apply the first open, 
last to close principle (use a list to which you append tags when you open 
them and then delete them when you close them starting from the last). In 
your 5th example:

 > '''''b & i''' i'' -> <em><strong>b & i</strong> i</em>

your parser would e.g. first hit ''' (open <strong> and append it to the 
OpenTags list), then the '' (open <em> and append it to the OpenTags list). 
When it finds the closing ''', it tries to close the <strong>, but it 
notices in the OpenTags list that there are tags before it. It closes those 
first (in this case, the last tag in OpenTags is <em>, so it closes it 
first, but places it in a different list, say RestoreTags), then it closes 
the <strong> and reopens the ones in RestoreTags - obviously, these end up 
being on the OpenTags list again. The generated code is then:

<strong><em>b & i</em></strong><em> i</em>

Which is not perfect, but it's valid XHTML :). Making it really intelligent 
would be quite a bit harder, especially if you consider you might be 
nesting more tags.

I'm not sure this is the way Kirk's Wiki does it, but I know it would work 
because I use this same principle in my regular expression tool to 
highlight parentheses.

I'm wondering how you'd handle '''''' though (can be two bolds or three 
italics).

-- 
Yours,

Andrei

=====
Mail address in header catches spam. Real contact info (decode with rot13):
cebwrpg5 at bcrenznvy.pbz. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V 
ernq gur yvfg, fb gurer'f ab arrq gb PP.





More information about the Tutor mailing list