[ python-Bugs-1074200 ] xml.dom.minidom produces errors with
certain unicode chars
SourceForge.net
noreply at sourceforge.net
Sat Nov 27 15:29:14 CET 2004
Bugs item #1074200, was opened at 2004-11-27 13:58
Message generated for change (Comment added) made by peerjanssen
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1074200&group_id=5470
Category: Unicode
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Peer Janssen (peerjanssen)
Assigned to: Nobody/Anonymous (nobody)
Summary: xml.dom.minidom produces errors with certain unicode chars
Initial Comment:
(note: I tried to file this before, but it didn't show
up in the list, so I try again.)
In a XML document generated by Trados Translators
Workbench (a TMX V 1.1 Translation Memory), the Unicode
characters U+0001 ("START OF HEADING", see
http://www.fileformat.info/info/unicode/char/0001/index.htm)
and SINGLE LOW-9 QUOTATION MARK (U+201A, see
http://www.fileformat.info/info/unicode/char/201a/index.htm)
produce errors when parsing it from a file with
"xml.dom.minidom".
The first one (0001) produces this output:
Traceback (most recent call last):
File "G:\_Prog\TMworks\domtree.py", line 7, in ?
dom=parse(tm)
File "C:\Python23\lib\xml\dom\minidom.py", line 1919,
in parse
return expatbuilder.parse(file)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line
928, in parse
result = builder.parseFile(file)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line
207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid
token): line 420, column 106
The second one (201A) produces this output:
Traceback (most recent call last):
File "G:\_Prog\TMworks\domtree.py", line 7, in ?
dom=parse(tm)
File "C:\Python23\lib\xml\dom\minidom.py", line 1919,
in parse
return expatbuilder.parse(file)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line
928, in parse
result = builder.parseFile(file)
File "C:\Python23\lib\xml\dom\expatbuilder.py", line
207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: mismatched tag: line 624,
column 2
Deleting these two characters in the whole document
produces the desired result.
I don't see why these characters should be of any
problem, especially the quotation mark.
----------------------------------------------------------------------
>Comment By: Peer Janssen (peerjanssen)
Date: 2004-11-27 14:29
Message:
Logged In: YES
user_id=896722
The file.
----------------------------------------------------------------------
Comment By: Peer Janssen (peerjanssen)
Date: 2004-11-27 14:27
Message:
Logged In: YES
user_id=896722
Here is a zip file with a test program domtree.py and two
test files. I noticed that the first test file produces it's
bug only on my windows box, but the second test file
produces an error on both my windows and my linux box.
The windows python version is:
Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit
(Intel)] on win32
The linux python version is:
Python 2.3.3. (#2, Feb 17, 2004, 11:45:40) [GCC 3.3.2
(Mandrake Linux 10.0 3.3.2-6mdk)] on linux2
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2004-11-27 14:02
Message:
Logged In: YES
user_id=38388
Please provide an example that lets us reproduce
the error.
Unassigning, since I'm not an expert for minidom.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1074200&group_id=5470
More information about the Python-bugs-list
mailing list