parsing "&A" in a string..

Tim Roberts timr at probo.com
Mon Sep 1 00:26:48 EDT 2008


"bruce" <bedouglas at earthlink.net> wrote:
>
>it's the beautifulsoup() that's taking the "&E" and giving the "&E;"...

Right, as it should.  "A&E" is not valid HTML, and beautifulsoup expects
valid HTML.

This can be difficult to fix in the general case, because your page might
already contain "&".  If it is possible that some of them might be
wrong while some are right, you can do something like:

    s = s.replace( '&', '&' ).replace( '&', '&' )
-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the Python-list mailing list