[TriZPUG] Regular expressions- my son getting me stuck!

Tim Arnold jtim.arnold at gmail.com
Wed Jun 13 21:28:49 CEST 2012


yes, BeautifulSoup is great. I also use lxml but it is more strict in
the html it can consume. If you have to parse any old html,
BeautifulSoup is your friend. Otherwise, if your html is really valid
html, lxml has a great interface and of course is useful for xml
parsing and validating.
--Tim


On Wed, Jun 13, 2012 at 3:25 PM,  <lionface.lemonface at gmail.com> wrote:
> BeauitifulSoup... I think there's also one that exposes a jquery-like syntax (pyquery?)
>
> JJ
> Sent on the Sprint® Now Network from my BlackBerry®
>
> -----Original Message-----
> From: Mark Browne <M.Browne at andor.com>
> Date: Wed, 13 Jun 2012 19:24:22
> To: lionface.lemonface at gmail.com<lionface.lemonface at gmail.com>; Triangle (North Carolina) Zope and Python Users Group<trizpug at python.org>
> Subject: RE: [TriZPUG] Regular expressions- my son getting me stuck!
>
> Hi
>
> Thanks - can you recommend a good an html parser?
>
> Mark
>
> -----Original Message-----
> From: trizpug-bounces+m.browne=andor.com at python.org [mailto:trizpug-bounces+m.browne=andor.com at python.org] On Behalf Of lionface.lemonface at gmail.com
> Sent: 13 June 2012 15:17
> To: Triangle (North Carolina) Zope and Python Users Group
> Subject: Re: [TriZPUG] Regular expressions- my son getting me stuck!
>
> I'd sugest using an html parsing library to do this, but the mistake in your regexp is the '.+' part - you want to match any _whitespace_ not any character - \s+ should work (\s matches any whitespace char)
>
> But also be careful with .* inside the attr, that will match " and <> too (the non greedy flag helps here but don't rely on it elsewhere!).
>
> HTH,
> JJ
> Sent on the Sprint(r) Now Network from my BlackBerry(r)
>
> -----Original Message-----
> From: Mark Browne <M.Browne at andor.com>
> Sender: trizpug-bounces+lionface.lemonface=gmail.com at python.org
> Date: Wed, 13 Jun 2012 18:33:55
> To: trizpug at python.org<trizpug at python.org>
> Reply-To: "Triangle \(North Carolina\) Zope and Python Users Group"
>        <trizpug at python.org>
> Subject: [TriZPUG] Regular expressions- my son getting me stuck!
>
> _______________________________________________
> TriZPUG mailing list
> TriZPUG at python.org
> http://mail.python.org/mailman/listinfo/trizpug
> http://trizpug.org is the Triangle Zope and Python Users Group
> _______________________________________________
> TriZPUG mailing list
> TriZPUG at python.org
> http://mail.python.org/mailman/listinfo/trizpug
> http://trizpug.org is the Triangle Zope and Python Users Group
>
>
> [http://www.andor.com/newsletter/footer/sig.jpg]<http://www.andor.com/newsletter/footer>
>
> _______________________________________________
> TriZPUG mailing list
> TriZPUG at python.org
> http://mail.python.org/mailman/listinfo/trizpug
> http://trizpug.org is the Triangle Zope and Python Users Group


More information about the TriZPUG mailing list