Looking for direction

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed May 13 21:23:09 EDT 2015


On Thu, 14 May 2015 09:24 am, 20/20 Lab wrote:

> I'm a beginner to python.  Reading here and there.  Written a couple of
> short and simple programs to make life easier around the office.
> 
> That being said, I'm not even sure what I need to ask for. I've never
> worked with external data before.
> 
> I have a LARGE csv file that I need to process.  110+ columns, 72k
> rows.  I managed to write enough to reduce it to a few hundred rows, and
> the five columns I'm interested in.

That's not large. Large is millions of rows, or tens of millions if you have
enough memory. What's large to you and me is usually small to the computer.

You should use the csv module for handling the CSV file, if you aren't
already doing so. Do you need a url to the docs?


> Now is were I have my problem:
> 
> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
>             [72976, "YYY", "Item", "Qty", "Noise"],
>             [123, "XXX" "ItemTypo", "Qty", "Noise"]    ]
> 
> Basically, I need to check for rows with duplicate accounts row[0] and
> staff (row[1]), and if so, remove that row, and add it's Qty to the
> original row. I really dont have a clue how to go about this.

Is the order of the rows important? If not, the problem is simpler.


processed = {}  # hold the processed data in a dict

for row in myList:
    account, staff = row[0:2]
    key = (account, staff)  # Put them in a tuple.
    if key in processed:
        # We've already seen this combination.
        processed[key][3] += row[3]  # Add the quantities.
    else:
        # Never seen this combination before.
        processed[key] = row

newlist = list(processed.values())


Does that help?



-- 
Steven




More information about the Python-list mailing list