[Tutor] finding mismatched or unpaired html tags
Dinesh B Vadhia
dineshbvadhia at hotmail.com
Tue Apr 28 16:41:36 CEST 2009
This is the error and traceback:
Unexpected error opening J:/F2/....html: mismatched tag: line 124, column 8
Traceback (most recent call last):
File "C:\....py", line 492, in <module>
raw = extractText(xhtmlfile)
File "C:\....py", line 334, in extractText
tree = make_tree(xhtmlfile)
File "....py", line 169, in make_tree
return tree
UnboundLocalError: local variable 'tree' referenced before assignment
Here is line 124, col 8 and I cannot see any obvious missing/mismatched tags:
"<p>As to the present time I am unable physical and mentally to secure all this information at present.</p>"
Dinesh
From: Kent Johnson
Sent: Tuesday, April 28, 2009 7:13 AM
To: Dinesh B Vadhia
Cc: tutor at python.org
Subject: Re: [Tutor] finding mismatched or unpaired html tags
On Tue, Apr 28, 2009 at 8:54 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> I'm processing tens of thousands of html files and a few of them contain
> mismatched tags and ElementTree throws the error:
>
> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124,
> column 8"
>
> I now want to scan each file and simply identify each mismatched or unpaired
> tags (by line number) in each file. I've read the ElementTree docs and
> cannot see anything obvious how to do this. I know this is a common problem
> but feeling a bit clueless here - any ideas?
It seems like the exception gives you the line number. What kind of
exception is raised? The exception object may contain the line and
column in a more accessible form, so you could catch the exception,
get the line number, then read that line out of the file and show it.
Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090428/cce5ac27/attachment-0001.htm>
More information about the Tutor
mailing list