[Tutor] XML parsing when elements contain foreign characters

Garry Bettle garry.bettle at gmail.com
Thu Jan 9 09:50:24 CET 2014


Howdy all,

Have you hear the news? Happy New Year!

Hope someone can help. I know this is a tutor list so please feel free to
send me somewhere else.

I'm trying to parse some XML and I'm struggling to reference elements that
contain foreign characters.

Code so far:

# -*- coding: utf-8 -*-

from xml.dom import minidom

xmldoc = minidom.parse('Export.xml')
products = xmldoc.getElementsByTagName('product')
print '%s Products' % len(products)

row_cnt = 0
titles = {}
stocklevel = {}
for product in products:
  row_cnt+=1
  title=product.getElementsByTagName('Titel')[0].firstChild.nodeValue
  stock=product.getElementsByTagName('AntalPåLager')[0].firstChild.nodeValue
  if title not in titles:
    titles[title]=1
  else:
    titles[title]+=1
  if stock not in stocklevel:
    stocklevel[stock]=1
  else:
    stocklevel[stock]+=1

Traceback (most recent call last):
  File "C:\Python27\Testing Zizzi.py", line 16, in <module>

stock=product.getElementsByTagName('AntalPÃ¥Lager')[0].firstChild.nodeValue
IndexError: list index out of range

I've tried to encode the string before giving it to getElementsByTagName
but no joy.

Any ideas?

Many thanks!

Cheers,

Garry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20140109/30686f67/attachment.html>


More information about the Tutor mailing list