[Tutor] encoding question

Alex Kleider akleider at sonic.net
Sun Jan 5 08:57:20 CET 2014


On 2014-01-04 21:20, Danny Yoo wrote:
> Oh!  That's unfortunate!  That looks like a bug on the hostip.info
> side.  Check with them about it.
> 
> 
> I can't get the source code to whatever is implementing the JSON
> response, so I can not say why the city is not being properly included
> there.
> 
> 
> [... XML rant about to start.  I am not disinterested, so my apologies
> in advance.]
> 
> ... In that case... I suppose trying the XML output is a possible
> approach.

Well, I've tried the xml approach which seems promising but still I get 
an encoding related error.
Is there a bug in the xml.etree module (not very likely, me thinks) or 
am I doing something wrong?
There's no denying that the whole encoding issue is still not completely 
clear to me in spite of having devoted a lot of time to trying to grasp 
all that's involved.

Here's what I've got:

alex at x301:~/Python/Parse$ cat ip_xml.py
#!/usr/bin/env python
# -*- coding : utf -8 -*-
# file: 'ip_xml.py'

import urllib2
import xml.etree.ElementTree as ET


url_format_str = \
     u'http://api.hostip.info/?ip=%s&position=true'

def ip_info(ip_address):
     response =  urllib2.urlopen(url_format_str %\
                                    (ip_address, ))
     encoding = response.headers.getparam('charset')
     print "'encoding' is '%s'." % (encoding, )
     info = unicode(response.read().decode(encoding))
     n = info.find('\n')
     print "location of first newline is %s." % (n, )
     xml = info[n+1:]
     print "'xml' is '%s'." % (xml, )

     tree = ET.fromstring(xml)
     root = tree.getroot()   # Here's where it blows up!!!
     print "'root' is '%s', with the following children:" % (root, )
     for child in root:
         print child.tag, child.attrib
     print "END of CHILDREN"
     return info

if __name__ == "__main__":
     info = ip_info("201.234.178.62")

alex at x301:~/Python/Parse$ ./ip_xml.py
'encoding' is 'iso-8859-1'.
location of first newline is 44.
'xml' is '<HostipLookupResultSet version="1.0.1" 
xmlns:gml="http://www.opengis.net/gml" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="http://www.hostip.info/api/hostip-1.0.1.xsd">
  <gml:description>This is the Hostip Lookup Service</gml:description>
  <gml:name>hostip</gml:name>
  <gml:boundedBy>
   <gml:Null>inapplicable</gml:Null>
  </gml:boundedBy>
  <gml:featureMember>
   <Hostip>
    <ip>201.234.178.62</ip>
    <gml:name>Bogotá</gml:name>
    <countryName>COLOMBIA</countryName>
    <countryAbbrev>CO</countryAbbrev>
    <!-- Co-ordinates are available as lng,lat -->
    <ipLocation>
     <gml:pointProperty>
      <gml:Point srsName="http://www.opengis.net/gml/srs/epsg.xml#4326">
       <gml:coordinates>-75.2833,10.4</gml:coordinates>
      </gml:Point>
     </gml:pointProperty>
    </ipLocation>
   </Hostip>
  </gml:featureMember>
</HostipLookupResultSet>
'.
Traceback (most recent call last):
   File "./ip_xml.py", line 33, in <module>
     info = ip_info("201.234.178.62")
   File "./ip_xml.py", line 23, in ip_info
     tree = ET.fromstring(xml)
   File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1301, in XML
     parser.feed(text)
   File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1641, in feed
     self._parser.Parse(data, 0)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in 
position 456: ordinal not in range(128)





More information about the Tutor mailing list