regexp

Jonathan Curran jonc at icicled.net
Tue Dec 19 21:15:44 EST 2006


On Tuesday 19 December 2006 15:32, Paul Arthur wrote:
> On 2006-12-19, vertigo <spam at spam.pl> wrote:
> > Hello
> >
> >> Take a look at Chapter 8 of 'Dive Into Python.'
> >> http://diveintopython.org/toc/index.html
> >
> > i read whole regexp chapter -
>
> Did you read Chapter 8?  Regexes are 7; 8 is about processing HTML.
> Regexes are not well suited to this type of processing.
>
> > but there was no solution for my problem.
> > Example:
> >
> > re.sub("<!--.*-->","",htmldata)
> > would remove only comments which are in one line.
> > If comment is in many lines like this:
> ><!--start
> > of
> > commend, end-->
> >
> > it would not work. It's because '.' sign does not matches '\n' sign.
> >
> > Does anybody knows solution for this particular problem ?
>
> Yes.  Use DOTALL mode.

Paul, I mentioned Chapter 8 so that the HTML processing section would be taken 
a look at. What Vertigo wants can be done with relative ease with SGMLlib.

Anyway, if you (Vertigo) want to use regular expressions to do this, you can 
try and use some regular expression testing programs. I'm not quite sure of 
the name but there is one that comes with KDE.

- Jonathan Curran



More information about the Python-list mailing list