[Baypiggies] Fwd: manipulating lists question

Martin Falatic martin at falatic.com
Thu Dec 5 11:40:31 CET 2013


Actually, I would leave my solution as-is functionality-wise... the reason
I split and rejoin the string list is to ensure it doesn't have
duplicates. It'd be more efficient to simply carry that data as a list or
set than a comma-delimited string until one gets to the final output stage
though... then you can format it more precisely on output (sorting might
be desired, for example).

I'll reply to your other note separately. This is,however, the core of it.
 - Marty

On Thu, December 5, 2013 02:32, Martin Falatic wrote:
> Actually, I'll just throw this out there since it's late and I have to
> get up early...
>
> I didn't post my solution originally because the other solutions were so
> much more elegant and a skosh more Pythonic too. The list split/join is a
> little overcomplicated but you can easily meld this with the other
> solutions to get something a bit more concise. However, a minor tweak to
> it satisfies this problem immediately, if the complication I asked about
> isn't a concern (if it is, it's pretty easy to fix it)
>
> y = dict() for part in x: if part[0] in y: mylist =
y[part[0]][0].split(",")
>  if not part[1] in mylist: mylist.append(part[1]) y[part[0]][0] =
> ",".join(mylist)
> else:
> y[part[0]] = part[1:3] # truncate the rest of the data subset print y
>
> I got this, which seems like what you wanted:
> {'6558': ['NM_001046.2', 'SLC12A2'], '1302': ['NM_080679.2,NM_080680.2',
> 'COL11A2']}
>
>
> when I tested this against this: x=[\ ['6558', 'NM_001046.2', 'SLC12A2',
> '6037226', '2', 'chr5',
> '127502453', '127502454', 'het-ref', 'snp', 'A', 'T', 'A', '185',
> '113', '184', '112', 'VQHIGH', 'VQHIGH', '', '', '', '', '259974',
> '9', '6', '6', '15', '6558:NM_001046.2:SLC12A2:CDS:MISSENSE',
> '6558:NM_001046.2:SLC12A2:CDS:NO-CHANGE', 'PFAM:PF01490:Aa_trans', '',
> '', '', '0.99', '2', '0.99', '0.998', '1.01', '1.000', '0.5', '0.46',
> '0.5', '1', '18', '18', '19', 'ref-identical;onlyA', 'snp', '0.072',
> '-1', 'SQHIGH']\
> ,\
> ['1302', 'NM_080679.2', 'COL11A2', '6525172', '2', 'chr6', '33271374',
> '33271376', 'het-ref', 'del', 'GT', '', 'GT', '542', '542', '458',
> '458', 'VQHIGH', 'VQHIGH', '', '', '', '', '71150', '34', '106',
> '106', '140', '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC',
> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080680.2:COL11A
> 2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080681.2:COL11A2:TSS-UPSTREAM:UNKNOWN-
> INC;6257:NM_021976.3:RXRB:CDS:NO-CHANGE',
> '', '', '', '', '0.95', '2', '0.98', '0.998', '0.99', '1.000', '0.46',
> '0.42', '0.5', '0', '102', '102', '102', 'ref-identical;onlyA', 'del',
> '0.990', '6', 'SQHIGH']\
> ,\
> ['1302', 'NM_080680.2', 'COL11A2', '6525172', '2', 'chr6', '33271374',
> '33271376', 'het-ref', 'del', 'GT', '', 'GT', '542', '542', '458',
> '458', 'VQHIGH', 'VQHIGH', '', '', '', '', '71150', '34', '106',
> '106', '140', '1302:NM_080680.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC',
> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080680.2:COL11A
> 2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080681.2:COL11A2:TSS-UPSTREAM:UNKNOWN-
> INC;6257:NM_021976.3:RXRB:CDS:NO-CHANGE',
> '', '', '', '', '0.95', '2', '0.98', '0.998', '0.99', '1.000', '0.46',
> '0.42', '0.5', '0', '102', '102', '102', 'ref-identical;onlyA', 'del',
> '0.990', '6', 'SQHIGH']\
> ]
>
>
> On Thu, December 5, 2013 02:22, Martin Falatic wrote:
>
>> Ah, genetics! Intriguing...
>>
>>
>>
>> Do you need anything beyond the third elements of each list? Does the
>> third element always map 1:1 with the first, or could it vary? If so,
>> what then?
>>
>> To refer to the simplified example, could you have this?
>> x = [['cat', 'NM123', 12], ['cat', 'NM234', 43], ['dog', 'NM56', 65]]
>>
>> If so, what is the expected output?
>>
>>
>>
>> - Marty
>>
>>
>>
>>
>>
>> On Thu, December 5, 2013 02:11, Vikram K wrote:
>>
>>
>>> i am having some difficulty in applying this to my actual problem
>>> although i love the dictionary method. Imagine the following three
>>> lists are the first, second and third elements of a larger list:
>>>
>>>>>> comp[6]
>>> ['6558', 'NM_001046.2', 'SLC12A2', '6037226', '2', 'chr5',
>>> '127502453',
>>> '127502454', 'het-ref', 'snp', 'A', 'T', 'A', '185', '113', '184',
>>> '112',
>>> 'VQHIGH', 'VQHIGH', '', '', '', '', '259974', '9', '6', '6', '15',
>>> '6558:NM_001046.2:SLC12A2:CDS:MISSENSE',
>>> '6558:NM_001046.2:SLC12A2:CDS:NO-CHANGE', 'PFAM:PF01490:Aa_trans', '',
>>>  '',
>>> '', '0.99', '2', '0.99', '0.998', '1.01', '1.000', '0.5', '0.46',
>>> '0.5',
>>> '1', '18', '18', '19', 'ref-identical;onlyA', 'snp', '0.072', '-1',
>>> 'SQHIGH']
>>>
>>>
>>>
>>>
>>>>>> comp[7]
>>> ['1302', 'NM_080679.2', 'COL11A2', '6525172', '2', 'chr6',
>>> '33271374',
>>> '33271376', 'het-ref', 'del', 'GT', '', 'GT', '542', '542', '458',
>>> '458',
>>> 'VQHIGH', 'VQHIGH', '', '', '', '', '71150', '34', '106', '106',
>>> '140',
>>> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC',
>>> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080680.2:CO
>>> L1
>>> 1A
>>> 2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080681.2:COL11A2:TSS-UPSTREAM:UNKNO
>>> WN
>>> -
>>> INC;6257:NM_021976.3:RXRB:CDS:NO-CHANGE',
>>> '', '', '', '', '0.95', '2', '0.98', '0.998', '0.99', '1.000', '0.46',
>>>  '0.42', '0.5', '0', '102', '102', '102', 'ref-identical;onlyA',
>>> 'del',
>>> '0.990', '6', 'SQHIGH']
>>>
>>>
>>>
>>>
>>>>>> comp[8]
>>> ['1302', 'NM_080680.2', 'COL11A2', '6525172', '2', 'chr6',
>>> '33271374',
>>> '33271376', 'het-ref', 'del', 'GT', '', 'GT', '542', '542', '458',
>>> '458',
>>> 'VQHIGH', 'VQHIGH', '', '', '', '', '71150', '34', '106', '106',
>>> '140',
>>> '1302:NM_080680.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC',
>>> '1302:NM_080679.2:COL11A2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080680.2:CO
>>> L1
>>> 1A
>>> 2:TSS-UPSTREAM:UNKNOWN-INC;1302:NM_080681.2:COL11A2:TSS-UPSTREAM:UNKNO
>>> WN
>>> -
>>> INC;6257:NM_021976.3:RXRB:CDS:NO-CHANGE',
>>> '', '', '', '', '0.95', '2', '0.98', '0.998', '0.99', '1.000', '0.46',
>>>  '0.42', '0.5', '0', '102', '102', '102', 'ref-identical;onlyA',
>>> 'del',
>>> '0.990', '6', 'SQHIGH']
>>>
>>>
>>>
>>>>>>
>>>
>>> ------
>>> Can we apply the dictionary method to the problem where the key of the
>>>  dictionary is the first element of the three smaller lists
>>> ('6558','1302',
>>> '1302'). The second and third elements of the larger list (starting
>>> with '1302') need to be collapsed into a single element, based on
>>> their second element ( 'NM_080679.2') and ('NM_080680.2') in a way
>>> similar to how we had tackled the toy problem:
>>>
>>> x = [['cat', 'NM123', 12], ['cat', 'NM234', 12], ['dog', 'NM56', 65]]
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Dec 5, 2013 at 4:18 AM, Michiel Overtoom <motoom at xs4all.nl>
>>> wrote:
>>>
>>>
>>>
>>>
>>>>
>>>> On Dec 5, 2013, at 10:09, Vikram K wrote:
>>>>
>>>>
>>>>
>>>>
>>>>> another option could have been to obtain a dictionary like so:
>>>>> {'dog':
>>>>> ['NM56', 65], 'cat': ['NM123,NM234', 12]}
>>>>>
>>>>>
>>>>>
>>>>
>>>> Oh, in that case the code can become somewhat simpler:
>>>>
>>>>
>>>>
>>>>
>>>> x = [['cat', 'NM123', 12], ['cat', 'NM234', 12], ['dog', 'NM56',
>>>> 65]]
>>>>
>>>>
>>>>
>>>> d = {} for key, label, quant in x: if key in d: d[key][0] += ", " +
>>>>  label else: d[key] = [label, quant]
>>>>
>>>> print d
>>>>
>>>>
>>>> I agree with Michael that the problem is somewhat underspecified,
>>>> but it's a starting point.
>>>>
>>>> Greetings,
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> "If you don't know, the thing to do is not to get scared, but to
>>>> learn." - Ayn Rand
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Baypiggies mailing list
>>> Baypiggies at python.org
>>> To change your subscription options or unsubscribe:
>>> https://mail.python.org/mailman/listinfo/baypiggies
>>>
>>>
>>
>>
>
>




More information about the Baypiggies mailing list