Looking for direction

20/20 Lab lab at pacbell.net
Thu May 14 11:58:48 EDT 2015



On 05/13/2015 06:12 PM, Dave Angel wrote:
> On 05/13/2015 08:45 PM, 20/20 Lab wrote:>
>
> You accidentally replied to me, rather than the mailing list. Please 
> use reply-list, or if your mailer can't handle that, do a Reply-All, 
> and remove the parts you don't want.
>
> >
> > On 05/13/2015 05:07 PM, Dave Angel wrote:
> >> On 05/13/2015 07:24 PM, 20/20 Lab wrote:
> >>> I'm a beginner to python.  Reading here and there. Written a 
> couple of
> >>> short and simple programs to make life easier around the office.
> >>>
> >> Welcome to Python, and to this mailing list.
> >>
> >>> That being said, I'm not even sure what I need to ask for. I've never
> >>> worked with external data before.
> >>>
> >>> I have a LARGE csv file that I need to process.  110+ columns, 72k
> >>> rows.
> >>
> >> That's not very large at all.
> >>
> > In the grand scheme, I guess not.  However I'm currently doing this
> > whole process using office.  So it can be a bit daunting.
>
> I'm not familiar with the "office" operating system.
>
> >>>  I managed to write enough to reduce it to a few hundred rows, and
> >>> the five columns I'm interested in.
> >>
> >>>
> >>> Now is were I have my problem:
> >>>
> >>> myList = [ [123, "XXX", "Item", "Qty", "Noise"],
> >>>             [72976, "YYY", "Item", "Qty", "Noise"],
> >>>             [123, "XXX" "ItemTypo", "Qty", "Noise"]    ]
> >>>
> >>
> >> It'd probably be useful to identify names for your columns, even if
> >> it's just in a comment.  Guessing from the paragraph below, I figure
> >> the first two columns are "account" & "staff"
> >
> > The columns that I pull are Account, Staff, Item Sold, Quantity sold,
> > and notes about the sale (notes arent particularly needed, but the
> > higher ups would like them in the report)
> >>
> >>> Basically, I need to check for rows with duplicate accounts row[0] 
> and
> >>> staff (row[1]), and if so, remove that row, and add it's Qty to the
> >>> original row.
> >>
> >> And which column is that supposed to be?  Shouldn't there be a number
> >> there, rather than a string?
> >>
> >>> I really dont have a clue how to go about this.  The
> >>> number of rows change based on which run it is, so I couldnt even get
> >>> away with using hundreds of compare loops.
> >>>
> >>> If someone could point me to some documentation on the functions I 
> would
> >>> need, or a tutorial it would be a great help.
> >>>
> >>
> >> Is the order significant?  Do you have to preserve the order that the
> >> accounts appear?  I'll assume not.
> >>
> >> Have you studied dictionaries?  Seems to me the way to handle the
> >> problem is to read in a row, create a dictionary with key of (account,
> >> staff), and data of the rest of the line.
> >>
> >> Each time you read a row, you check if the key is already in the
> >> dictionary.  If not, add it.  If it's already there, merge the data as
> >> you say.
> >>
> >> Then when you're done, turn the dict back into a list of lists.
> >>
> > The order is irrelevant.  No, I've not really studied dictionaries, but
> > a few people have mentioned it.  I'll have to read up on them and, more
> > importantly, their applications.  Seems that they are more versatile
> > then I thought.
> >
> > Thank you.
>
> You have to realize that a tuple can be used as a key, in your case a 
> tuple of Account and Staff.
>
> You'll have to decide how you're going to merge the ItemSold, 
> QuantitySold, and notes.
>
Tells you how often I actually talk in mailing lists.  My apologies, and 
thank you again.



More information about the Python-list mailing list