Extracting dataframe column with multiple conditions on row values

Edmondo Giovannozzi edmondo.giovannozzi at gmail.com
Sat Jan 8 08:00:43 EST 2022


Il giorno sabato 8 gennaio 2022 alle 02:21:40 UTC+1 dn ha scritto:
> Salaam Mahmood,
> On 08/01/2022 12.07, Mahmood Naderan via Python-list wrote: 
> > I have a csv file like this 
> > V0,V1,V2,V3 
> > 4,1,1,1 
> > 6,4,5,2 
> > 2,3,6,7 
> > 
> > And I want to search two rows for a match and find the column. For 
> > example, I want to search row[0] for 1 and row[1] for 5. The corresponding 
> > column is V2 (which is the third column). Then I want to return the value 
> > at row[2] and the found column. The result should be 6 then.
> Not quite: isn't the "found column" also required?
> > I can manually extract the specified rows (with index 0 and 1 which are 
> > fixed) and manually iterate over them like arrays to find a match. Then I
> Perhaps this idea has been influenced by a similar solution in another 
> programming language. May I suggest that the better-answer you seek lies 
> in using Python idioms (as well as Python's tools)...
> > key1 = 1 
> > key2 = 5
> Fine, so far - excepting that this 'problem' is likely to be a small 
> part of some larger system. Accordingly, consider writing it as a 
> function. In which case, these two "keys" will become 
> function-parameters (and the two 'results' become return-values).
> > row1 = df.iloc[0] # row=[4,1,1,1] 
> > row2 = df.iloc[1] # row=[6,4,5,2]
> This is likely not native-Python. Let's create lists for 'everything', 
> just-because: 
> 
> >>> headings = [ "V0","V1","V2","V3" ] 
> >>> row1 = [4,1,1,1] 
> >>> row2 = [6,4,5,2] 
> >>> results = [ 2,3,6,7 ] 
> 
> 
> Note how I'm using the Python REPL (in a "terminal", type "python" (as 
> appropriate to your OpSys) at the command-line). IMHO the REPL is a 
> grossly under-rated tool, and is a very good means towards 
> trial-and-error, and learning by example. Highly recommended! 
> 
> 
> > for i in range(len(row1)): 
> 
> This construction is very much a "code smell" for thinking that it is 
> not "pythonic". (and perhaps the motivation for this post) 
> 
> In Python (compared with many other languages) the "for" loop should 
> actually be pronounced "for-each". In other words when we pair the 
> code-construct with a list (for example): 
> 
> for each item in the list the computer should perform some suite of 
> commands. 
> 
> (the "suite" is everything 'inside' the for-each-loop - NB my 
> 'Python-betters' will quickly point-out that this feature is not limited 
> to Python-lists, but will work with any :iterable" - ref: 
> https://docs.python.org/3/tutorial/controlflow.html#for-statements) 
> 
> 
> Thus: 
> 
> > for item in headings: print( item ) 
> ... 
> V0 
> V1 
> V2 
> V3 
> 
> 
> The problem is that when working with matrices/matrixes, a math 
> background equips one with the idea of indices/indexes, eg the 
> ubiquitous subscript-i. Accordingly, when reading 'math' where a formula 
> uses the upper-case Greek "sigma" character, remember that it means "for 
> all" or "for each"! 
> 
> So, if Python doesn't use indexing or "pointers", how do we deal with 
> the problem? 
> 
> Unfortunately, at first glance, the pythonic approach may seem 
> more-complicated or even somewhat convoluted, but once the concepts 
> (and/or the Python idioms) are learned, it is quite manageable (and 
> applicable to many more applications than matrices/matrixes!)...
> > if row1[i] == key1: 
> > for j in range(len(row2)): 
> > if row2[j] == key2: 
> > res = df.iloc[:,j] 
> > print(res) # 6 
> > 
> > Is there any way to use built-in function for a more efficient code?
> This is where your idea bears fruit! 
> 
> There is a Python "built-in function": zip(), which will 'join' lists. 
> NB do not become confused between zip() and zip archive/compressed files! 
> 
> Most of the time reference book and web-page examples show zip() being 
> used to zip-together two lists into a single data-construct (which is an 
> iterable(!)). However, zip() will actually zip-together multiple (more 
> than two) "iterables". As the manual says: 
> 
> «zip() returns an iterator of tuples, where the i-th tuple contains the 
> i-th element from each of the argument iterables.» 
> 
> Ah, so that's where the math-idea of subscript-i went! It has become 
> 'hidden' in Python's workings - or putting that another way: Python 
> looks after the subscripting for us (and given that 'out by one' errors 
> in pointers is a major source of coding-error in other languages, 
> thank-you very much Python!) 
> 
> First re-state the source-data as Python lists, (per above) - except 
> that I recommend the names be better-chosen to be more meaningful (to 
> your application)! 
> 
> 
> Now, (in the REPL) try using zip(): 
> 
> >>> zip( headings, row1, row2, results ) 
> <zip object at 0x7f655cca6bc0> 
> 
> Does that seem a very good illustration? Not really, but re-read the 
> quotation from the manual (above) where it says that zip returns an 
> iterator. If we want to see the values an iterator will produce, then 
> turn it into an iterable data-structure, eg: 
> 
> >>> list( zip( headings, row1, row2, results ) ) 
> [('V0', 4, 6, 2), ('V1', 1, 4, 3), ('V2', 1, 5, 6), ('V3', 1, 2, 7)] 
> 
> or, to see things more clearly, let me re-type it as: 
> 
> [ 
> ('V0', 4, 6, 2), 
> ('V1', 1, 4, 3), 
> ('V2', 1, 5, 6), 
> ('V3', 1, 2, 7) 
> ] 
> 
> 
> What we now see is actually a "transpose" of the original 'matrix' 
> presented in the post/question! 
> 
> (NB Python will perform this layout for us - read about the pprint library) 
> 
> 
> Another method which can also be employed (and which will illustrate the 
> loop required to code the eventual-solution(!)) is that Python's next() 
> will extract the first row of the transpose: 
> 
> >>> row = next( zip( headings, row1, row2, results ) ) 
> >>> row 
> ('V0', 4, 6, 2) 
> 
> 
> This is all-well-and-good, but that result is a tuple of four items 
> (corresponding to one column in the way the source-data was explained). 
> 
> If we need to consider the four individual data-items, that can be 
> improved using a Python feature called "tuple unpacking". Instead of the 
> above delivering a tuple which is then assigned to "row", the tuple can 
> be assigned to four "identifiers", eg 
> 
> >>> heading, row1_item, row2_item, result= next( zip( headings, row1, 
> row2, results ) ) 
> 
> (apologies about email word-wrapping - this is a single line of Python-code) 
> 
> 
> Which, to prove the case, could be printed: 
> 
> >>> heading, row1_item, row2_item, result 
> ('V0', 4, 6, 2) 
> 
> 
> (ref: 
> https://docs.python.org/3/tutorial/datastructures.html?highlight=tuple%20unpacking#tuples-and-sequences) 
> 
> 
> Thus, if we repeatedly ask for the next() row from the zip-ped 
> transpose, eventually it will respond with the row starting 'V2' - which 
> is the desired-result, ie the row containing the 1, the 5, and the 6 - 
> and if you follow-through using the REPL, will be clearly visible. 
> 
> 
> Finally, 'all' that is required, is a for-each-loop which will iterate 
> across/down the zip object, one tuple (row of the transpose) at a time, 
> AND perform the "tuple-unpacking" all in one command, with an 
> if-statement to detect the correct row/column: 
> 
> >>> for *tuple-unpacking* in *zip() etc*: 
> ... if row1_item == *what?* and row2_item == *what?* 
> ... print( *which* and *which identifier* ) 
> ... 
> V2 6 
> 
> Yes, three lines. It's as easy as that! 
> (when you know how) 
> 
> Worse: when you become more expert, you'll be able to compress all of 
> that down into a single-line solution - but it won't be as "readable" as 
> is this! 
> 
> 
> NB this question has a 'question-smell' of 'homework', so I'll not 
> complete the code for you - this is something *you* asked to learn and 
> the best way to learn is by 'doing' (not by 'reading'). 
> 
> However, please respond with your solution, or any further question 
> (with the next version of the code so-far, per this first-post - which 
> we appreciate!) 
> 
> Regardless, you asked 'the right question' (curiosity is the key to 
> learning) and in the right way/manner. Well done! 
> 
> 
> NBB the above code-outline does not consider the situation where the 
> search fails/the keys are not found! 
> 
> 
> For further information, please review: 
> https://docs.python.org/3/library/functions.html?highlight=zip#zip 
> 
> Also, further to the above discussion of combining lists and loops: 
> https://docs.python.org/3/tutorial/datastructures.html?highlight=zip#looping-techniques 
> 
> and with a similar application (to this post): 
> https://docs.python.org/3/faq/programming.html?highlight=zip#how-can-i-sort-one-list-by-values-from-another-list 
> 
> -- 
> Regards, 

You may also transpose your dataset. Then the index will become your column name and the column name become your index:
To read your dataset:

import pandas as pd
import io

DN = """
V0,V1,V2,V3
4,1,1,1
6,4,5,2
2,3,6,7
"""
df = pd.read_csv(io.StringIO(DN))

Transpose it:

dft = df.T

Find all the index with your condition:
 
idt = (dft[0] == 1) & (dft[1] == 5)

Print the columns that satisfy your condition:

print(dft[idt])

As you see, without explicit loop.


More information about the Python-list mailing list