Extracting dataframe column with multiple conditions on row values

dn PythonList at DancesWithMice.info
Fri Jan 7 20:21:05 EST 2022


Salaam Mahmood,


On 08/01/2022 12.07, Mahmood Naderan via Python-list wrote:
>    I have a csv file like this
>    V0,V1,V2,V3
>    4,1,1,1
>    6,4,5,2
>    2,3,6,7
> 
>    And I want to search two rows for a match and find the column. For
>    example, I want to search row[0] for 1 and row[1] for 5. The corresponding
>    column is V2 (which is the third column). Then I want to return the value
>    at row[2] and the found column. The result should be 6 then.

Not quite: isn't the "found column" also required?


>    I can manually extract the specified rows (with index 0 and 1 which are
>    fixed) and manually iterate over them like arrays to find a match. Then I

Perhaps this idea has been influenced by a similar solution in another
programming language. May I suggest that the better-answer you seek lies
in using Python idioms (as well as Python's tools)...


>    key1 = 1
>    key2 = 5

Fine, so far - excepting that this 'problem' is likely to be a small
part of some larger system. Accordingly, consider writing it as a
function. In which case, these two "keys" will become
function-parameters (and the two 'results' become return-values).


>    row1 = df.iloc[0]              # row=[4,1,1,1]
>    row2 = df.iloc[1]              # row=[6,4,5,2]

This is likely not native-Python. Let's create lists for 'everything',
just-because:

>>> headings = [ "V0","V1","V2","V3" ]
>>> row1 = [4,1,1,1]
>>> row2 = [6,4,5,2]
>>> results = [ 2,3,6,7 ]


Note how I'm using the Python REPL (in a "terminal", type "python" (as
appropriate to your OpSys) at the command-line). IMHO the REPL is a
grossly under-rated tool, and is a very good means towards
trial-and-error, and learning by example. Highly recommended!


>    for i in range(len(row1)):

This construction is very much a "code smell" for thinking that it is
not "pythonic". (and perhaps the motivation for this post)

In Python (compared with many other languages) the "for" loop should
actually be pronounced "for-each". In other words when we pair the
code-construct with a list (for example):

for each item in the list the computer should perform some suite of
commands.

(the "suite" is everything 'inside' the for-each-loop - NB my
'Python-betters' will quickly point-out that this feature is not limited
to Python-lists, but will work with any :iterable" - ref:
https://docs.python.org/3/tutorial/controlflow.html#for-statements)


Thus:

> for item in headings: print( item )
...
V0
V1
V2
V3


The problem is that when working with matrices/matrixes, a math
background equips one with the idea of indices/indexes, eg the
ubiquitous subscript-i. Accordingly, when reading 'math' where a formula
uses the upper-case Greek "sigma" character, remember that it means "for
all" or "for each"!

So, if Python doesn't use indexing or "pointers", how do we deal with
the problem?

Unfortunately, at first glance, the pythonic approach may seem
more-complicated or even somewhat convoluted, but once the concepts
(and/or the Python idioms) are learned, it is quite manageable (and
applicable to many more applications than matrices/matrixes!)...


>        if row1[i] == key1:
>            for j in range(len(row2)):
>                 if row2[j] == key2:
>                     res = df.iloc[:,j]
>                     print(res)    # 6
> 
>    Is there any way to use built-in function for a more efficient code?


This is where your idea bears fruit!

There is a Python "built-in function": zip(), which will 'join' lists.
NB do not become confused between zip() and zip archive/compressed files!

Most of the time reference book and web-page examples show zip() being
used to zip-together two lists into a single data-construct (which is an
iterable(!)). However, zip() will actually zip-together multiple (more
than two) "iterables". As the manual says:

«zip() returns an iterator of tuples, where the i-th tuple contains the
i-th element from each of the argument iterables.»

Ah, so that's where the math-idea of subscript-i went! It has become
'hidden' in Python's workings - or putting that another way: Python
looks after the subscripting for us (and given that 'out by one' errors
in pointers is a major source of coding-error in other languages,
thank-you very much Python!)

First re-state the source-data as Python lists, (per above) - except
that I recommend the names be better-chosen to be more meaningful (to
your application)!


Now, (in the REPL) try using zip():

>>> zip( headings, row1, row2, results )
<zip object at 0x7f655cca6bc0>

Does that seem a very good illustration? Not really, but re-read the
quotation from the manual (above) where it says that zip returns an
iterator. If we want to see the values an iterator will produce, then
turn it into an iterable data-structure, eg:

>>> list( zip( headings, row1, row2, results ) )
[('V0', 4, 6, 2), ('V1', 1, 4, 3), ('V2', 1, 5, 6), ('V3', 1, 2, 7)]

or, to see things more clearly, let me re-type it as:

[
('V0', 4, 6, 2),
('V1', 1, 4, 3),
('V2', 1, 5, 6),
('V3', 1, 2, 7)
]


What we now see is actually a "transpose" of the original 'matrix'
presented in the post/question!

(NB Python will perform this layout for us - read about the pprint library)


Another method which can also be employed (and which will illustrate the
loop required to code the eventual-solution(!)) is that Python's next()
will extract the first row of the transpose:

>>> row = next( zip( headings, row1, row2, results ) )
>>> row
('V0', 4, 6, 2)


This is all-well-and-good, but that result is a tuple of four items
(corresponding to one column in the way the source-data was explained).

If we need to consider the four individual data-items, that can be
improved using a Python feature called "tuple unpacking". Instead of the
above delivering a tuple which is then assigned to "row", the tuple can
be assigned to four "identifiers", eg

>>> heading, row1_item, row2_item, result= next( zip( headings, row1,
row2, results ) )

(apologies about email word-wrapping - this is a single line of Python-code)


Which, to prove the case, could be printed:

>>> heading, row1_item, row2_item, result
('V0', 4, 6, 2)


(ref:
https://docs.python.org/3/tutorial/datastructures.html?highlight=tuple%20unpacking#tuples-and-sequences)


Thus, if we repeatedly ask for the next() row from the zip-ped
transpose, eventually it will respond with the row starting 'V2' - which
is the desired-result, ie the row containing the 1, the 5, and the 6 -
and if you follow-through using the REPL, will be clearly visible.


Finally, 'all' that is required, is a for-each-loop which will iterate
across/down the zip object, one tuple (row of the transpose) at a time,
AND perform the "tuple-unpacking" all in one command, with an
if-statement to detect the correct row/column:

>>> for *tuple-unpacking* in *zip() etc*:
...     if row1_item == *what?* and row2_item == *what?*
...         print( *which* and *which identifier* )
...
V2 6

Yes, three lines. It's as easy as that!
(when you know how)

Worse: when you become more expert, you'll be able to compress all of
that down into a single-line solution - but it won't be as "readable" as
is this!


NB this question has a 'question-smell' of 'homework', so I'll not
complete the code for you - this is something *you* asked to learn and
the best way to learn is by 'doing' (not by 'reading').

However, please respond with your solution, or any further question
(with the next version of the code so-far, per this first-post - which
we appreciate!)

Regardless, you asked 'the right question' (curiosity is the key to
learning) and in the right way/manner. Well done!


NBB the above code-outline does not consider the situation where the
search fails/the keys are not found!


For further information, please review:
https://docs.python.org/3/library/functions.html?highlight=zip#zip

Also, further to the above discussion of combining lists and loops:
https://docs.python.org/3/tutorial/datastructures.html?highlight=zip#looping-techniques

and with a similar application (to this post):
https://docs.python.org/3/faq/programming.html?highlight=zip#how-can-i-sort-one-list-by-values-from-another-list

-- 
Regards,
=dn


More information about the Python-list mailing list