[Tutor] pls. help me in sorting and choosing

Thu Aug 11 06:59:28 CEST 2005

Quoting Srinivas Iyyer <srini_iyyer_bio at yahoo.com>:

> My question is how can I code to distinguish all high
> scoring group and all low scoring group. 

One thing you need to decide is what it means to be high scoring.  Is an element
high scoring if its score is above some threshhold, or it a percentage?  Or
something else?

(eg: "an element is high scoring if its score is > 1000" or "an element is high
scoring if it is in the top 20% when ranked by score")

Do you know how to read your data in from your file?  If you have a file looking
like this:

NM_004619.2	4910.8
NM_004619.2	2716.3
NM_145759.1	4805.7
NM_14 5759.1	2716.3
XM_378692.1	56.00

then I would convert that into a list of tuples:

[("NM_004619.2", 4910.8), ("NM_004619.2", 2716.3), ("NM_145759.1", 4805.7),
("NM_14 5759.1", 2716.3), ("XM_378692.1", 56.00)]

If you can do this, then it is easy to ask python to sort it for you.

>>> data = [("NM_004619.2", 4910.8), ("NM_004619.2", 2716.3), ("NM_145759.1",
4805.7), ("NM_14 5759.1", 2716.3), ("XM_378692.1", 56.00)]
>>> data.sort(key=lambda x: x[1], reverse=True)
>>> data
[('NM_004619.2', 4910.8000000000002), ('NM_145759.1', 4805.6999999999998),
('NM_004619.2', 2716.3000000000002), ('NM_14 5759.1', 2716.3000000000002),
('XM_378692.1', 56.0)]

Now, the first elements in the list have the highest score, and you can decide
how far down to go.

Alternatively, you could ask for all elements above a certain score:

>>> [x for x in data if x[1] > 3000]
[('NM_004619.2', 4910.8000000000002), ('NM_145759.1', 4805.6999999999998)]

HTH!

(note: key= and reverse= arguments to sort() are a new feature in Python2.4.  If
you are using an earlier Python, you will have to do things slightly
differently.  Probably the easiest change to make would be to have the score be
the first element in the tuples)

-- 
John.