[Tutor] finding mismatched or unpaired html tags

Dinesh B Vadhia dineshbvadhia at hotmail.com
Tue Apr 28 16:41:36 CEST 2009


This is the error and traceback:

Unexpected error opening J:/F2/....html: mismatched tag: line 124, column 8

Traceback (most recent call last):
  File "C:\....py", line 492, in <module>
    raw = extractText(xhtmlfile)
  File "C:\....py", line 334, in extractText
    tree = make_tree(xhtmlfile)
  File "....py", line 169, in make_tree
    return tree
UnboundLocalError: local variable 'tree' referenced before assignment
 

Here is line 124, col 8 and I cannot see any obvious missing/mismatched tags:

"<p>As to the present time I am unable physical and mentally to secure all this information at present.</p>"

Dinesh




From: Kent Johnson 
Sent: Tuesday, April 28, 2009 7:13 AM
To: Dinesh B Vadhia 
Cc: tutor at python.org 
Subject: Re: [Tutor] finding mismatched or unpaired html tags


On Tue, Apr 28, 2009 at 8:54 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> I'm processing tens of thousands of html files and a few of them contain
> mismatched tags and ElementTree throws the error:
>
> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124,
> column 8"
>
> I now want to scan each file and simply identify each mismatched or unpaired
> tags (by line number) in each file.  I've read the ElementTree docs and
> cannot see anything obvious how to do this.  I know this is a common problem
> but feeling a bit clueless here - any ideas?

It seems like the exception gives you the line number. What kind of
exception is raised? The exception object may contain the line and
column in a more accessible form, so you could catch the exception,
get the line number, then read that line out of the file and show it.

Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090428/cce5ac27/attachment-0001.htm>


More information about the Tutor mailing list