[Tutor] Problem with nested for-in (Alan Gauld)

Thu Jan 29 19:27:23 CET 2009

On Thu, Jan 29, 2009 at 12:21 PM, emmanuel.delaborde
<emmanuel.delaborde at cimex.com> wrote:

> On 29 Jan 2009, at 15:26, Kent Johnson wrote:
>> What are you trying to do? Generally problems of the form
>> for item in list1:
>>  if item in list2:
>>   # do something with item
>>
>
> the first csv file is a list of rows like this  : cat_code, ... other
> irrelevent fields here ...
>
> the second csv file is a list of rows like this :  story_code, cat_code
> (there can be many  story_code for each cat_code)
>
> I am trying to build the list of story_code for each cat_code in the first
> file
>
> (it really is similar to a SQL join, funny things is these CSV files ARE
> table dumps...)

OK, so make a dict that maps cat_code to a list of story_code and look
up the cat_codes in the dict. For example (untested, not explained
much either),

# Build a dict mapping cat_codes to a list of story_codes
from collections import defaultdict
cat_to_story = defaultdict(list)
lines2 = csv.reader(open("CATEGORYLIST.csv","r"))
for line2 in lines2:
  cat_to_story[line2[1]].append(line2[0])

# Now you can build old_cats easily, using dictionary lookup instead of search
# (this assumes every cat_code in lines is in the dict
lines = csv.reader(open("CATEGORY.csv","r"))
old_cats = [(line[0],line[2], cat_to_story[line[0])]

> building the dictionnary would lead to a similar iterator reset problem
> though...

No, because you build the dict once.

> I am not very familiar with the set data type, are you talking about its
> specific set operations like intersection, union etc... ?

Yes; or testing for membership.

Kent