[Tutor] XML parsing when elements contain foreign characters
Garry Bettle
garry.bettle at gmail.com
Thu Jan 9 09:50:24 CET 2014
Howdy all,
Have you hear the news? Happy New Year!
Hope someone can help. I know this is a tutor list so please feel free to
send me somewhere else.
I'm trying to parse some XML and I'm struggling to reference elements that
contain foreign characters.
Code so far:
# -*- coding: utf-8 -*-
from xml.dom import minidom
xmldoc = minidom.parse('Export.xml')
products = xmldoc.getElementsByTagName('product')
print '%s Products' % len(products)
row_cnt = 0
titles = {}
stocklevel = {}
for product in products:
row_cnt+=1
title=product.getElementsByTagName('Titel')[0].firstChild.nodeValue
stock=product.getElementsByTagName('AntalPåLager')[0].firstChild.nodeValue
if title not in titles:
titles[title]=1
else:
titles[title]+=1
if stock not in stocklevel:
stocklevel[stock]=1
else:
stocklevel[stock]+=1
Traceback (most recent call last):
File "C:\Python27\Testing Zizzi.py", line 16, in <module>
stock=product.getElementsByTagName('AntalPÃ¥Lager')[0].firstChild.nodeValue
IndexError: list index out of range
I've tried to encode the string before giving it to getElementsByTagName
but no joy.
Any ideas?
Many thanks!
Cheers,
Garry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20140109/30686f67/attachment.html>
More information about the Tutor
mailing list