From an existing Pandas DataFrame, how can I create a summary DataFrame based on the union of overlapping date ranges (given a start and an end date) and an additional column?

joseph pareti joepareti54 at gmail.com
Tue Jun 9 05:44:46 EDT 2020


i gave it a shot, see attached

Am Mi., 3. Juni 2020 um 23:38 Uhr schrieb Aaron <aaron.christensen at gmail.com
>:

> Hello,
>
> Given a dateframe with trips made by employees of different companies, I am
> trying to generate a new dataframe with only the company names.  I am
> looking to combine the overlapping travel times from employees of the SAME
> company into a single row.  If there are no overlapping travel times, then
> that row just transfers over as-is.  When there are overlapping travel
> times, then the following will happen:
>
> --The name field is removed b/c that is no longer relevant (company name
> stays), the Depart date will be the earliest date of any of the trip dates
> regardless of the employee, the Return date will be the latest date of any
> of the trip dates regardless of the employee, the charges for the trip will
> be summed
>
> For example, if trips had dates 01/01/20 - 01/31/20, 01/15/20 - 02/15/20,
> 02/01-20 - 02/28/20, then all three would be combined.  The starting date
> will be 1/1/20 and ending as of 2/28/20.  Basically, the company was on
> that trip from start to finish… kinda like a relay run handing off the
> baton.  Also, the charges will be summed for each of those trips and
> transferred over to the single row.
>
> Here is the starting dataframe code/output (note: the row order is
> typically not already sorted by company name as in this example):
>
> import pandas as pd
>
>
> emp_trips = {'Name': ['Bob','Joe','Sue','Jack', 'Henry', 'Frank',
> 'Lee', 'Jack'],
>         'Company': ['ABC', 'ABC', 'ABC', 'HIJ', 'HIJ', 'DEF', 'DEF',
> 'DEF'],
>         'Depart' : ['01/01/2020', '01/01/2020', '01/06/2020',
> '01/01/2020', '05/01/2020', '01/13/2020', '01/12/2020', '01/14/2020'],
>         'Return' : ['01/31/2020', '02/15/2020', '02/20/2020',
> '03/01/2020', '05/05/2020', '01/15/2020', '01/30/2020', '02/02/2020'],
>         'Charges': [10.10, 20.25, 30.32, 40.00, 50.01, 60.32, 70.99, 80.87]
>         }
>
> df = pd.DataFrame(emp_trips, columns = ['Name', 'Company', 'Depart',
> 'Return', 'Charges'])
> # Convert to date format
> df['Return']= pd.to_datetime(df['Return'])
> df['Depart']= pd.to_datetime(df['Depart'])
>
>   Name Company     Depart     Return  Charges0    Bob     ABC
> 2020-01-01 2020-01-31    10.101    Joe     ABC 2020-01-01 2020-02-15
>  20.252    Sue     ABC 2020-01-06 2020-02-20    30.323   Jack     HIJ
> 2020-01-01 2020-03-01    40.004  Henry     HIJ 2020-05-01 2020-05-05
>  50.015  Frank     DEF 2020-01-13 2020-01-15    60.326    Lee     DEF
> 2020-01-12 2020-01-30    70.997   Jack     DEF 2020-01-14 2020-02-02
>  80.87
>
> And, here is the desired/generated dataframe:
>
>   Company      Depart      Return  Charges0     ABC  01/01/2020
> 02/20/2020    60.671     HIJ  01/01/2020  03/01/2020    40.002     HIJ
>  05/01/2020  05/05/2020    50.013     DEF  01/12/2020  02/02/2020
> 212.18
>
> I have been trying to use a combination of sorting and grouping but
> the best I've achieved is reordering the dataframe.  Even though I am
> able to sort/group based on values, I still run into the issues of
> finding overlapping date ranges and pulling out all trips based on a
> single company per aggregate/overlapping date range.
>
> Thank you in advance for any help!
>
> Aaron
> --
> https://mail.python.org/mailman/listinfo/python-list
>


-- 
Regards,
Joseph Pareti - Artificial Intelligence consultant
Joseph Pareti's AI Consulting Services
https://www.joepareti54-ai.com/
cell +49 1520 1600 209
cell +39 339 797 0644


More information about the Python-list mailing list