[NEWB] Dictionary instantiation?

Bruno Desthuilliers bruno.42.desthuilliers at wtf.websiteburo.oops.com
Fri Dec 7 04:13:10 EST 2007


Matt_D a écrit :
> Hello there, this is my first post to the list. Only been working with
> Python for a few days. Basically a complete newbie to programming.
> 
> I'm working with csv module as an exercise to parse out a spreadsheet
> I use for work.(I am an editor for a military journalism unit) Not
> trying to do anything useful, just trying to manipulate the data.
> Anyway, here's the code I've got so far:
> 
> import csv
> import string
> import os
> 
> #Open the appropriate .csv file
> csv_file = csv.reader(open("D:\\Python25\\BNSR.csv"))
> 
> #Create blank dictionary to hold {[author]:[no. of stories]} data
> story_per_author = {}
> 
> def author_to_dict(): #Function to add each author to the dictionary
> once to get initial entry for that author

First point: your comment would be better in a docstring - and that 
would make the code more readable

def author_to_dict():
   """Function to add each author to the dictionary
      once to get initial entry for that author
   """
>     for row in csv_file:

Second point: you're using 2 global variables. This is something to 
avoid whenever possible (that is: almost always). Here you're in the 
very typical situation of a function that produces output 
(story_per_author) depending only on it's input (csv_file) - so the 
correct implementation is to pass the input as an argument and return 
the output:

>         author_count = row[-1]
>         story_per_author[author_count] = 1


def author_to_dict(csv_file):
     story_per_author = {}
     for row in csv_file:
         author_count = row[-1]
         story_per_author[author_count] = 1
     return story_per_author

Now take care: the object returned by csv.reader is not a sequence, it's 
an iterator. Once you've looped over all it's content, it's exhausted.

> #Fetch author names
> def rem_blank_authors(): 

same remark wrt/ comments

#Function to remove entries with '' in the
> AUTHOR field of the .csv

   # Convert the open file to list format
   # for e-z mode editing

>     csv_list = list(csv_file) 

Yet another useless global.


 >     for row in csv_list:
>         author_name = row[-1]
>         if author_name == '': #Find entries where no author is listed
>             csv_list.remove(row) #Remove those entries from the list

Since you don't return anything from this function, the only effect is 
to consume the whole global csv_file iterator - the csv_list object is 
discarded after function execution.


> def assign_author_to_title(): #Assign an author to every title
>     author_of_title = {}
>     for row in csv_file:
>         title = row[3]
>         author = row[-1]
>         author_of_title[title] = author

Same remarks here

> 
> assign_author_to_title()
> print author_of_title

author_of_title is local to the assign_author_to_title function. You 
cannot access it from outside.

> --
> 
> Ok, the last two lines are kind of my "test the last function" test.
> Now when I run these two lines I get the error:
> 
> Traceback (most recent call last):
22>   File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
> \scriptutils.py", line 310, in RunScript
>     exec codeObject in __main__.__dict__
>   File "D:\Python25\csv_read.py", line 33, in <module>
>     print author_of_title
> NameError: name 'author_of_title' is not defined
>
> I am guessing that the author_of_title dict does not exist outside of
> the function in which it is created? 

Bingo.

> The concept of instantiation is
> sort of foreign to me so I'm having some trouble predicting when it
> happens.

It has nothing to do with instanciation, it's about scoping rules. A 
named defined in a function is local to that function. If you create an 
object in a function and want to make it available to the outside world, 
you have to return it from the function - like I did the rewrite of 
author_to_dict - and of course assign this return value to another name 
in the caller function scope.

> If I call the assign_author_to_title function later, am I going to be
> able to work with the author_of_title dictionary? Or is it best if I
> create author_of_title outside of my function definitions?

By all mean avoid global variables. In all the above code, there's not a 
single reason to use them. Remember that functions take params and 
return values. Please take a little time to read more material about 
functions and scoping rules.

> Clearly I'm just stepping through my thought process right now,
> creating functions as I see a need for them. I'm sure the code is
> sloppy and terrible

Well... It might be better !-)

Ok, there are the usual CS-newbie-struggling-with-new-concepts errors. 
The cure is well-known : read more material (tutorials and code), 
experiment (Python is great for this - read about the '-i' option of the 
python interpreter), and post here when you run into trouble.

> but please be gentle!

Hope I haven't been to rude !-)



More information about the Python-list mailing list