[Tutor] check if two differently shaped dataframes are contained one another

ThreeBlindQuarks threesomequarks at proton.me
Mon Apr 3 19:42:04 EDT 2023


Dave,

This looks a bit like homework, so I think asking for a motivation is possibly not going to result in much! LOL!

But going with it, I would prefer to see the problem done more in the context of a matrix.

The problem then can be restated as given an NxM matrix, and a larger matrix (meaning the rows and columns are greater than or equal to the smallone) then can you find a subset on the larger matrix starting with upper left corner at X,Y such that every element between X and X+N-1, and every column between Y an Y+M-1 is the same.

There are many ways to do this including some I suspect are not acceptable in the course.

But the algorithm basically can look like this in English:

I will call the smaller matrix the ORIGINAL. The other I call LARGER.

So first if the dimensions of LARGER are too small, the answer is FALSE as in cannot be found.

Then you want to do a nested loop for all coordinates for the row between the first and stopping N or so back from the number of rows in largest and then a lo0p similarly for columns. At each step of the loop you extract the matrix of the same size as ORIGINAL and compare to ORIGINAL.

Using a list comprehension might do most of the work in a single line depending what info you want to return.

Do you want a TRUE/FALSE, or the coordinates at which there is ONE match or maybe a list of many matches?

Are there many other ways? Sure. But once you have two matrices of the same dimensions, equality can easily be measured without another equivalent loop. Keeping the data in a Dataframe format in numpy/pandas may also be doable as it can be addressed by row/column numbers.

But do note the method chosen may have odd results if the two data structures are different in some ways such as including integers or floating point or even complex numbers or other derived objects.

- Q





Sent with Proton Mail secure email.

------- Original Message -------
On Monday, April 3rd, 2023 at 7:02 PM, dn via Tutor <tutor at python.org> wrote:


> On 04/04/2023 06.17, marc nicole wrote:
> 
> > Hello,
> > 
> > I have this first dataframe
> > 
> > 1 2
> > 3 4
> > 
> > and this second dataframe
> > 
> > 5 9 3 8
> > 7 1 2 0
> > 6 3 4 10
> > 
> > and i want to check whether the first dataframe is contained in the second
> > (which is the case here)
> > 
> > I know that .isin() and .compare() both require that the dataframes are
> > shaped the same.
> > Note: I am looking for a "pandas" oriented solution
> 
> 
> This looks like fun...
> 
> What is the use-case, the reason for searching for such sub-set data-frames?
> 
> What have you tried thus-far?
> 
> Why does it have to be a pandas-implementation?
> 
> --
> Regards,
> =dn
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor


More information about the Tutor mailing list