emacs lisp text processing example (html5 figure/figcaption)

Tue Jul 5 18:09:46 EDT 2011

On Tue, Jul 5, 2011 at 2:37 PM, Xah Lee <xahlee at gmail.com> wrote:
> but in anycase, i can't see how this part would work
> <p class="cpt">((?:[^<]|<(?!/p>))+)</p>

It's not that different from the pattern 「alt="[^"]+"」 earlier in the
regex.  The capture group accepts one or more characters that either
aren't '<', or that are '<' but are not immediately followed by '/p>'.
 Thus it stops capturing when it sees exactly '</p>' without consuming
the '<'.  Using my regex with the example that you posted earlier
demonstrates that it works:

>>> import re
>>> s = '''<div class="img">
... <img src="jamie_cat.jpg" alt="jamie's cat" width="167" height="106">
... <p class="cpt">jamie's cat! Her blog is <a href="http://example.com/
... jamie/">http://example.com/jamie/</a></p>
... </div>'''
>>> print re.sub(pattern, replace, s)
<figure>
<img src="jamie_cat.jpg" alt="jamie's cat" width="167" height="106">
<figcaption>jamie's cat! Her blog is <a href="http://example.com/
jamie/">http://example.com/jamie/</a></figcaption>
</figure>

Cheers,
Ian