[Tutor] R: re question on array

Peter Otten __peter__ at web.de
Thu Oct 30 10:14:27 CET 2014


jarod_v6 at libero.it wrote:

> Dear All,
> Sorry for my bad presentation of my problem!!
> I have this tipe of input:
> A file with a long liste of gene ad the occurence for sample:
> 
> gene	Samples
> FUS	SampleA
> TP53	SampleA
> ATF4	SampleB
> ATF3	SampleC
> ATF4	SampleD
> FUS	SampleE
> RORA	SampleE
> RORA	SampleC
> 
> WHat I want to obtain is amtrix where I have the occurence for sample.
> SampleA	SampleB	SampleC	SampleD	SampleE
> FUS	1	0	0	0	1
> TP53	1	0	0	0	0
> ATF4	0	1		1	0
> ATF3	0	0	1	0	0
> RORA	0	0	1	0
> 
> In that way I count count the occurence in fast way!
> 
> At the moment I only able to do the list of the rownames and the sample
> names. Unfortunately I don't know how to create this matrix.
> Cold you help me ?
> Thanks for the patience and the help

Open the file, skip the first line and convert the remaining lines into 
(gene, sample) tuples. I assume that you know enough Python to do that.

Then build dict that maps (gene, sample) tuples to the number of occurences:

pivot = {
   ("FUS", "SampleA"): 1,
   ...
   ("RORA", "SampleC"): 1,
}

Remember to handle both the case when the tuple is already in the dict and 
when it's not in the dict. (Once you did it successfully have a look at the 
collections.Counter class).

Now you need the row/column labels. You can extract them from the dict with

rows = sorted(set(row for row, column in pivot)) # use set(...) to avoid 
duplicates
columns = ... # something very similar

You can then print the table with

print([""] + columns)
for row in rows:
    print([row] + [pivot.get((row, column), 0) for column in columns])

Use the str.format() method on the table cells to prettify the output and 
you're done.



More information about the Tutor mailing list