Need some direction in completing the exercise below, appreciate any input given, thanks!

wadson.espindola at gmail.com wadson.espindola at gmail.com
Tue Oct 7 15:07:53 EDT 2014


The aim of this exercise is to combine the sample database, click tracking information from a test website and application, and information from user's social networks.

The sample database contains the following fields and is made up of 500 records.

        first_name,     last_name, company_name, address, city, county,     state, zip, phone1, phone2,           email,     web

Here are the instructions:

1) Download the US500 database from http://www.briandunning.com/sample-data/

2) Use the exchange portion of the telephone numbers (the middle three digits) as the proxy for "user clicked on and expressed interest in this topic". Identify groups of users that share topic interests (exchange numbers match).

3) Provide an API that takes an e-mail address an input, and returns the e-mail addresses of other users that share that interest.

4) Extend that API to return users within a certain "distance" N of that interest. For example, if the original user has an interest in group 236, and N is 2, return all users with interests in 234 through 238.

5) Identify and rank the states with the largest groups, and (separately) the largest number of groups.

6) Provide one or more demonstrations that the API works.  These can be via a testing framework, and/or a quick and dirty web or command line client, or simply by driving it from a browser and  showing a raw result.


I was able to import the data this way, however I know there's a better method using the CSV module. The code below just reads lines, I'd like to be able to split each individual field into columns and assign primary and foreign keys in order to solve the challenge. What's the best method to accomplish this task?

import os, csv, json, re

        class fetch500():                                             # class instantiation
            def __init__(self):                                   # initializes data import object
                US_500file = open('us-500.csv')
                us_500list = US_500file.readlines()
                for column in us_500list:
                    print column,                                   # prints out phone1 array

        data_import = fetch500()
        print fetch500()



More information about the Python-list mailing list