Easier way to do this?

Thomas Jollans tjol at tjol.eu
Wed Oct 4 16:11:46 EDT 2017


On 04/10/17 18:42, 20/20 Lab wrote:
> Looking for advice for what looks to me like clumsy code.
>
> I have a large csv (effectively garbage) dump.  I have to pull out
> sales information per employee and count them by price range. I've got
> my code working, but I'm thinking there must be a more refined way of
> doing this.
>
> ---snippet of what I have---
>
> EMP1 = [0,0]
> EMP2 = [0,0]
> EMP3 = [0,0]
>
> for line in (inputfile):
>     content = line.split(",")
First of all, you don't want to make a habit of parsing common file
formats by hand. CSV is simple enough, but there are some edge cases
that might come up when you're least expecting them. The stdlib has a
csv module:

https://docs.python.org/3/library/csv.html

>     if content[18] == "EMP1":
>         if float(content[24]) < 99.75:
>             EMP1[0] += 1
>         elif float(content[24]) > 99.74:
>             EMP1[1] += 1
>     if content[18] == "EMP2":
>         if float(content[24]) < 99.75:
>             EMP2[0] += 1
>         elif float(content[24]) > 99.74:
>             EMP2[1] += 1
>     if content[18] == "EMP3":
>         if float(content[24]) < 99.75:
>             EMP3[0] += 1
>         elif float(content[24]) > 99.74:
>             EMP3[1] += 1

This just cries out for something like pandas (everyone's favourite data
analysis library)
Be warned, pandas is part of the scientific python stack, which is
immensely powerful and popular, but it does have a distinctive style
that may appear cryptic if you're used to the way the rest of the world
writes Python.

Your code could end up looking something like this:

import pandas as pd
employee_data = pd.read_csv(csv_filename)
employee_data['is_great'] = employee_data['cryptic_number'] > 99.74
employee_data['is_fantastic'] = employee_data['cryptic_number'] < 99.75
by_employee = employee_data.groupby('employee').sum()
greatness_dict = dict(by_employee['is_great'])
fantasy_dict = dict(by_employee['is_fantastic'])


>
> and repeat if statements for the rest of 25+ employees.  I can make a
> list of the employees, but I'd prefer to pull them from the csv, as
> our turnover is rather high (however this is not important).  I'm
> thinking another "for employee in content[18]" should be there, but
> when I tried, my numbers were incorrect.
>
> Any help / advice is appreciated,
>
> Matt
>





More information about the Python-list mailing list