Looking for direction

Wed May 13 21:12:53 EDT 2015

On 05/13/2015 08:45 PM, 20/20 Lab wrote:>

You accidentally replied to me, rather than the mailing list.  Please 
use reply-list, or if your mailer can't handle that, do a Reply-All, and 
remove the parts you don't want.

 >
 > On 05/13/2015 05:07 PM, Dave Angel wrote:
 >> On 05/13/2015 07:24 PM, 20/20 Lab wrote:
 >>> I'm a beginner to python.  Reading here and there.  Written a couple of
 >>> short and simple programs to make life easier around the office.
 >>>
 >> Welcome to Python, and to this mailing list.
 >>
 >>> That being said, I'm not even sure what I need to ask for. I've never
 >>> worked with external data before.
 >>>
 >>> I have a LARGE csv file that I need to process.  110+ columns, 72k
 >>> rows.
 >>
 >> That's not very large at all.
 >>
 > In the grand scheme, I guess not.  However I'm currently doing this
 > whole process using office.  So it can be a bit daunting.

I'm not familiar with the "office" operating system.

 >>>  I managed to write enough to reduce it to a few hundred rows, and
 >>> the five columns I'm interested in.
 >>
 >>>
 >>> Now is were I have my problem:
 >>>
 >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
 >>>             [72976, "YYY", "Item", "Qty", "Noise"],
 >>>             [123, "XXX" "ItemTypo", "Qty", "Noise"]    ]
 >>>
 >>
 >> It'd probably be useful to identify names for your columns, even if
 >> it's just in a comment.  Guessing from the paragraph below, I figure
 >> the first two columns are "account" & "staff"
 >
 > The columns that I pull are Account, Staff, Item Sold, Quantity sold,
 > and notes about the sale (notes arent particularly needed, but the
 > higher ups would like them in the report)
 >>
 >>> Basically, I need to check for rows with duplicate accounts row[0] and
 >>> staff (row[1]), and if so, remove that row, and add it's Qty to the
 >>> original row.
 >>
 >> And which column is that supposed to be?  Shouldn't there be a number
 >> there, rather than a string?
 >>
 >>> I really dont have a clue how to go about this.  The
 >>> number of rows change based on which run it is, so I couldnt even get
 >>> away with using hundreds of compare loops.
 >>>
 >>> If someone could point me to some documentation on the functions I 
would
 >>> need, or a tutorial it would be a great help.
 >>>
 >>
 >> Is the order significant?  Do you have to preserve the order that the
 >> accounts appear?  I'll assume not.
 >>
 >> Have you studied dictionaries?  Seems to me the way to handle the
 >> problem is to read in a row, create a dictionary with key of (account,
 >> staff), and data of the rest of the line.
 >>
 >> Each time you read a row, you check if the key is already in the
 >> dictionary.  If not, add it.  If it's already there, merge the data as
 >> you say.
 >>
 >> Then when you're done, turn the dict back into a list of lists.
 >>
 > The order is irrelevant.  No, I've not really studied dictionaries, but
 > a few people have mentioned it.  I'll have to read up on them and, more
 > importantly, their applications.  Seems that they are more versatile
 > then I thought.
 >
 > Thank you.

You have to realize that a tuple can be used as a key, in your case a 
tuple of Account and Staff.

You'll have to decide how you're going to merge the ItemSold, 
QuantitySold, and notes.

-- 
DaveA

-- 
DaveA