How to replace a cell value with each of its contour cells and yield the corresponding datasets seperately in a list according to a Pandas-way?

marc nicole mk1853387 at gmail.com
Sun Jan 21 13:25:17 EST 2024


It is part of a larger project aiming at processing data according to a
given algorithm
Do you have any comments or any enhancing recommendations on the code?

Thanks.

Le dim. 21 janv. 2024 à 18:28, Thomas Passin via Python-list <
python-list at python.org> a écrit :

> On 1/21/2024 11:54 AM, marc nicole wrote:
> > Thanks for the reply,
> >
> > I think using a Pandas (or a Numpy) approach would optimize the
> > execution of the program.
> >
> > Target cells could be up to 10% the size of the dataset, a good example
> > to start with would have from 10 to 100 values.
>
> Thanks for the reformatted code.  It's much easier to read and think about.
>
> For say 100 points, it doesn't seem that "optimization" would be much of
> an issue.  On my laptop machine and Python 3.12, your example takes
> around 5 seconds to run and print().  OTOH if you think you will go to
> much larger datasets, certainly execution time could become a factor.
>
> I would think that NumPy arrays and/or matrices would have good potential.
>
> Is this some kind of a cellular automaton, or an image filtering process?
>
> > Let me know your thoughts, here's a reproducible example which I
> formatted:
> >
> >
> >
> > from numpy import random
> > import pandas as pd
> > import numpy as np
> > import operator
> > import math
> > from collections import deque
> > from queue import *
> > from queue import Queue
> > from itertools import product
> >
> >
> > def select_target_values(dataframe, number_of_target_values):
> >      target_cells = []
> >      for _ in range(number_of_target_values):
> >          row_x = random.randint(0, len(dataframe.columns) - 1)
> >          col_y = random.randint(0, len(dataframe) - 1)
> >          target_cells.append((row_x, col_y))
> >      return target_cells
> >
> >
> > def select_contours(target_cells):
> >      contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)]
> >      contour_cells = []
> >      for target_cell in target_cells:
> >          # random contour count for each cell
> >          contour_cells_count = random.randint(1, 4)
> >          try:
> >              contour_cells.append(
> >                  [
> >                      tuple(
> >                          map(
> >                              lambda i, j: i + j,
> >                              (target_cell[0], target_cell[1]),
> >                              contour_coordinates[iteration_],
> >                          )
> >                      )
> >                      for iteration_ in range(contour_cells_count)
> >                  ]
> >              )
> >          except IndexError:
> >              continue
> >      return contour_cells
> >
> >
> > def create_zipf_distribution():
> >      zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5))
> >
> >      zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3)
> >
> >      return zipf_distribution_dataset
> >
> >
> > def apply_contours(target_cells, contour_cells):
> >      target_cells_with_contour = []
> >      # create one single list of cells
> >      for idx, target_cell in enumerate(target_cells):
> >          target_cell_with_contour = [target_cell]
> >          target_cell_with_contour.extend(contour_cells[idx])
> >          target_cells_with_contour.append(target_cell_with_contour)
> >      return target_cells_with_contour
> >
> >
> > def create_possible_datasets(dataframe, target_cells_with_contour):
> >      all_datasets_final = []
> >      dataframe_original = dataframe.copy()
> >
> >      list_tuples_idx_cells_all_datasets = list(
> >          filter(
> >              lambda x: x,
> >              [list(tuples) for tuples in
> > list(product(*target_cells_with_contour))],
> >          )
> >      )
> >      target_original_cells_coordinates = list(
> >          map(
> >              lambda x: x[0],
> >              [
> >                  target_and_contour_cell
> >                  for target_and_contour_cell in target_cells_with_contour
> >              ],
> >          )
> >      )
> >      for dataset_index_values in list_tuples_idx_cells_all_datasets:
> >          all_datasets = []
> >          for idx_cell in range(len(dataset_index_values)):
> >              dataframe_cpy = dataframe.copy()
> >              dataframe_cpy.iat[
> >                  target_original_cells_coordinates[idx_cell][1],
> >                  target_original_cells_coordinates[idx_cell][0],
> >              ] = dataframe_original.iloc[
> >                  dataset_index_values[idx_cell][1],
> > dataset_index_values[idx_cell][0]
> >              ]
> >              all_datasets.append(dataframe_cpy)
> >          all_datasets_final.append(all_datasets)
> >      return all_datasets_final
> >
> >
> > def main():
> >      zipf_dataset = create_zipf_distribution()
> >
> >      target_cells = select_target_values(zipf_dataset, 5)
> >      print(target_cells)
> >      contour_cells = select_contours(target_cells)
> >      print(contour_cells)
> >      target_cells_with_contour = apply_contours(target_cells,
> contour_cells)
> >      datasets = create_possible_datasets(zipf_dataset,
> > target_cells_with_contour)
> >      print(datasets)
> >
> >
> > main()
> >
> > Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list
> > <python-list at python.org <mailto:python-list at python.org>> a écrit :
> >
> >     On 1/21/2024 7:37 AM, marc nicole via Python-list wrote:
> >      > Hello,
> >      >
> >      > I have an initial dataframe with a random list of target cells
> >     (each cell
> >      > being identified with a couple (x,y)).
> >      > I want to yield four different dataframes each containing the
> >     value of one
> >      > of the contour (surrounding) cells of each specified target cell.
> >      >
> >      > the surrounding cells to consider for a specific target cell are
> >     : (x-1,y),
> >      > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4
> >     cells from
> >      > these and consider for replacement to the target cell.
> >      >
> >      > I want to do that through a pandas-specific approach without
> >     having to
> >      > define the contour cells separately and then apply the changes on
> the
> >      > dataframe
> >
> >     1. Why do you want a Pandas-specific approach?  Many people would
> >     rather
> >     keep code independent of special libraries if possible;
> >
> >     2. How big can these collections of target cells be, roughly
> speaking?
> >     The size could make a big difference in picking a design;
> >
> >     3. You really should work on formatting code for this list.  Your
> code
> >     below is very complex and would take a lot of work to reformat to the
> >     point where it is readable, especially with the nearly impenetrable
> >     arguments in some places.  Probably all that is needed is to replace
> >     all
> >     tabs by (say) three spaces, and to make sure you intentionally break
> >     lines well before they might get word-wrapped.  Here is one example I
> >     have reformatted (I hope I got this right):
> >
> >     list_tuples_idx_cells_all_datasets = list(filter(
> >          lambda x: utils_tuple_list_not_contain_nan(x),
> >          [list(tuples) for tuples in list(
> >                itertools.product(*target_cells_with_contour))
> >          ]))
> >
> >     4. As an aside, it doesn't look like you need to convert all those
> >     sequences and iterators to lists all over the place;
> >
> >
> >      > (but rather using an all in one approach):
> >      > for now I have written this example which I think is not Pandas
> >     specific:
> >     [snip]
> >
> >     --
> >     https://mail.python.org/mailman/listinfo/python-list
> >     <https://mail.python.org/mailman/listinfo/python-list>
> >
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>


More information about the Python-list mailing list