[Tutor] Clarified: Best way to alter sections of a string which
match dictionary keys?
Karl Pflästerer
sigurd at 12move.de
Sat Jan 3 18:48:16 EST 2004
On 3 Jan 2004, SSokolow <- from_python_tutor at SSokolow.com wrote:
> Your reply is confusing me but as I understand it, there are three
> problems with this:
I didn't want to confuse you.
[...]
> 2. What do you mean safer? The situation may not apply to this
The regexp isn't 100% safe against badly (or broken) written HTML. A
match starts with a `<a' then are some attributes then somewhere is a
`href"'. I'm not absolutley sure at the moment (I had to reread the
docs of W3C) how much the syntax may differ. Furthermore you need to
cope with HTML and XHTML; the last should be the smaller problem as it
is much stricter but HTML may differ a lot. That's because a lot of
people don't read the docs of W3C. But I think you need to cope with
spaces between `href=' and the following value of the attribute. Also
the quotes can be single or double quotes (should be double).
That's not the biggest problem all this can be handled with a regexp but
if yoou had the (pathological) case that somebody writes
<a ....> <a </a> .. </a>
a regexp will fail. But maybe that never happens or only once in a
million. If you can live with it fine.
[...]
> I also forgot to mention that the variable string does not hold the
> entire file. This is run for each chunk of data as it's received from
> the server. (I don't know how to build content-layer filtering into
> the proxy code I'm extending so I hooked it in at the content layer.
> testing has shown that some links lie across chunk boundaries like
> this:
> [continued from previous chunk]is some link text</a>
> .
> .
> .
> <a href="whatever">This is th[continued in next chunk]
> and I don't know if the HTML parser might stumble on an unclosed <a>
> tag pair.
With that the parser can cope very well. You just had to change the
code a bit but that should be possible.
But if spped matters I think the simple regexp solution might suffice.
[...]
I think the problem is interesting so post here if you know more (but
please with as much facts as possible).
Karl
--
Please do *not* send copies of replies to me.
I read the list
More information about the Tutor
mailing list