Looking for direction

20/20 Lab lab at pacbell.net
Thu May 14 12:57:48 EDT 2015



On 05/13/2015 06:23 PM, Steven D'Aprano wrote:
> On Thu, 14 May 2015 09:24 am, 20/20 Lab wrote:
>
>> I'm a beginner to python.  Reading here and there.  Written a couple of
>> short and simple programs to make life easier around the office.
>>
>> That being said, I'm not even sure what I need to ask for. I've never
>> worked with external data before.
>>
>> I have a LARGE csv file that I need to process.  110+ columns, 72k
>> rows.  I managed to write enough to reduce it to a few hundred rows, and
>> the five columns I'm interested in.
> That's not large. Large is millions of rows, or tens of millions if you have
> enough memory. What's large to you and me is usually small to the computer.
>
> You should use the csv module for handling the CSV file, if you aren't
> already doing so. Do you need a url to the docs?
>
I actually stumbled across the csv module after coding enough to make a 
list of lists.  So that is more the reason I approached the list;  
Nothing like spending hours (or days) coding something that already 
exists and just dont know about.
>> Now is were I have my problem:
>>
>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>>              [72976, "YYY", "Item", "Qty", "Noise"],
>>              [123, "XXX" "ItemTypo", "Qty", "Noise"]    ]
>>
>> Basically, I need to check for rows with duplicate accounts row[0] and
>> staff (row[1]), and if so, remove that row, and add it's Qty to the
>> original row. I really dont have a clue how to go about this.
> Is the order of the rows important? If not, the problem is simpler.
>
>
> processed = {}  # hold the processed data in a dict
>
> for row in myList:
>      account, staff = row[0:2]
>      key = (account, staff)  # Put them in a tuple.
>      if key in processed:
>          # We've already seen this combination.
>          processed[key][3] += row[3]  # Add the quantities.
>      else:
>          # Never seen this combination before.
>          processed[key] = row
>
> newlist = list(processed.values())
>
>
> Does that help?
>
>
>
It does, immensely.  I'll make this work.  Thank you again for the link 
from yesterday and apologies for hitting the wrong reply button.  I'll 
have to study more on the usage and implementations of dictionaries and 
tuples.



More information about the Python-list mailing list