[Tutor] encoding question

Sun Jan 5 03:31:13 CET 2014

A heartfelt thank you to those of you that have given me much to ponder 
with your helpful responses.
In the mean time I've rewritten my procedure using a different approach 
all together.  I'd be interested in knowing if you think it's worth 
keeping or do you suggest I use your revisions to my original hack?

I've been maintaining both a Python3 and a Python2.7 version.  The 
latter has actually opened my eyes to more complexities. Specifically 
the need to use unicode strings rather than Python2.7's default ascii.

Here it is:
alex at x301:~/Python/Parse$ cat ip_info.py
#!/usr/bin/env python
# -*- coding : utf -8 -*-

import re
import urllib2

url_format_str = \
     u'http://api.hostip.info/get_html.php?ip=%s&position=true'

info_exp = r"""
Country:[ ](?P<country>.*)
[\n]
City:[ ](?P<city>.*)
[\n]
[\n]
Latitude:[ ](?P<lat>.*)
[\n]
Longitude:[ ](?P<lon>.*)
[\n]
IP:[ ](?P<ip>.*)
         """
info_pattern = re.compile(info_exp, re.VERBOSE).search

def ip_info(ip_address):
     """
Returns a dictionary keyed by Country, City, Lat, Long and IP.

Depends on http://api.hostip.info (which returns the following:
'Country: UNITED STATES (US)\nCity: Santa Rosa, CA\n\nLatitude:
38.4486\nLongitude: -122.701\nIP: 76.191.204.54\n'.)
THIS COULD BREAK IF THE WEB SITE GOES AWAY!!!
"""
     response =  urllib2.urlopen(url_format_str %\
                                    (ip_address, ))
     encoding = response.headers.getparam('charset')

     info = info_pattern(response.read().decode(encoding))
     return {"Country" : unicode(info.group("country")),
             "City" : unicode(info.group("city")),
             "Lat" : unicode(info.group("lat")),
             "Lon" : unicode(info.group("lon")),
             "IP" : unicode(info.group("ip"))            }

if __name__ == "__main__":
     print """    IP address is %(IP)s:
         Country: %(Country)s;  City: %(City)s.
         Lat/Long: %(Lat)s/%(Lon)s""" % ip_info("201.234.178.62")

Apart from soliciting your general comments, I'm also interested to know 
exactly what the line
# -*- coding : utf -8 -*-
really indicates or more importantly, is it true, since I am using vim 
and I assume things are encoded as ascii?

I've discovered that with Ubuntu it's very easy to switch from English 
(US) to English (US, international with dead keys) with just two clicks 
so thanks for that tip as well.