[Tutor] Custom (non-standard) SGML Parser and OOP Problem

Adam Kessel adam@bostoncoop.net
Sun May 18 22:09:50 2003


--azLHFNyN32YCQGCU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

I'm writing a custom SGML parser. I started to do it from scratch, but
then it occurred to me that I ought not to reinvent the wheel (the wheel
being sgmllib). =20

I want to use sgmllib-like parsing, but I only want to recognize tags
matching:

<$tag$>

sgmllib is, of course, looking for <[A-Za-z] to start tags, so these
sorts of tags are not detected. I'd also like to not have to deal with
any tags other than <$tag$>.

I discovered I could violate a central tenet of OOP and get this to work,
by putting the following in my code:

sgmllib.starttagopen =3D re.compile('<[>\$]')
(etc.)

That is, overwriting the regexps used in sgmllib to locate tags. But this
seems like a dangerous way to do things. Putting the following in my
parser __init__ doesn't work:

self.starttagopen =3D re.compile('<[>\$]')

Because starttagopen is not defined in sgmllib.SGMLParser, but in sgmllib
itself. =20

What's the right way to do this?  My question, boiled down, is: how to
use sgmllib functionality but overwrite sgmllib internals without being a
bad programmer? =20

--Adam Kessel

--azLHFNyN32YCQGCU
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE+x+U9dTf3ZklQ6qYRAkOAAKDIbN0CzQEucydygpxBdZMK+6muBgCdGbuI
+JlGIZaxcM6p2auKnqxh71c=
=LFg/
-----END PGP SIGNATURE-----

--azLHFNyN32YCQGCU--