[ python-Bugs-1074200 ] xml.dom.minidom produces errors with
certain unicode chars
SourceForge.net
noreply at sourceforge.net
Sat Nov 27 15:02:32 CET 2004
Bugs item #1074200, was opened at 2004-11-27 14:58
Message generated for change (Comment added) made by lemburg
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1074200&group_id=5470
Category: Unicode
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Peer Janssen (peerjanssen)
>Assigned to: Nobody/Anonymous (nobody)
Summary: xml.dom.minidom produces errors with certain unicode chars
Initial Comment:
(note: I tried to file this before, but it didn't show
up in the list, so I try again.)
In a XML document generated by Trados Translators
Workbench (a TMX V 1.1 Translation Memory), the Unicode
characters U+0001 ("START OF HEADING", see
http://www.fileformat.info/info/unicode/char/0001/index.htm)
and SINGLE LOW-9 QUOTATION MARK (U+201A, see
http://www.fileformat.info/info/unicode/char/201a/index.htm)
produce errors when parsing it from a file with
"xml.dom.minidom".
The first one (0001) produces this output:
Traceback (most recent call last):
File "G:\_Prog\TMworks\domtree.py", line 7, in ?
dom=parse(tm)
File "C:\Python23\lib\xml\dom\minidom.py", line 1919,
in parse
return expatbuilder.parse(file)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line
928, in parse
result = builder.parseFile(file)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line
207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid
token): line 420, column 106
The second one (201A) produces this output:
Traceback (most recent call last):
File "G:\_Prog\TMworks\domtree.py", line 7, in ?
dom=parse(tm)
File "C:\Python23\lib\xml\dom\minidom.py", line 1919,
in parse
return expatbuilder.parse(file)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line
928, in parse
result = builder.parseFile(file)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line
207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: mismatched tag: line 624,
column 2
Deleting these two characters in the whole document
produces the desired result.
I don't see why these characters should be of any
problem, especially the quotation mark.
----------------------------------------------------------------------
>Comment By: M.-A. Lemburg (lemburg)
Date: 2004-11-27 15:02
Message:
Logged In: YES
user_id=38388
Please provide an example that lets us reproduce
the error.
Unassigning, since I'm not an expert for minidom.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1074200&group_id=5470
More information about the Python-bugs-list
mailing list