XML parsing per record
Willem Ligtenberg
WLigtenberg at gmail.com
Fri Apr 22 09:22:17 EDT 2005
As I'm trying to write the code using cElementTree.
I stumble across one problem. Sometimes there are multiple values to
retrieve from one record for the same element. Like this:
<Prot-ref_name_E>ATP-binding cassette, subfamily G, member 1</Prot-ref_name_E>
<Prot-ref_name_E>ATP-binding cassette 8</Prot-ref_name_E>
How do you get not only the first, but the rest as well, so that I can
store it in a list.
Thanks in advance,
Willem Ligtenberg
On Fri, 22 Apr 2005 13:48:15 +0200, Willem Ligtenberg wrote:
> This is all the info I need from the xml file:
> ID --> <Gene-track_geneid>320632</Gene-track_geneid>
>
> Name --> <Gene-ref>
> <Gene-ref_locus>Pzp</Gene-ref_locus>
>
> Startbase --> <Gene-commentary_seqs>
> <Seq-loc>
> <Seq-loc_int>
> <Seq-interval>
> <Seq-interval_from>126957426</Seq-interval_from>
> <Seq-interval_to>126989473</Seq-interval_to>
> <Seq-interval_strand>
> <Na-strand value="plus"/>
> </Seq-interval_strand>
> <Seq-interval_id>
> <Seq-id>
> <Seq-id_gi>51860766</Seq-id_gi>
> </Seq-id>
> </Seq-interval_id>
> </Seq-interval>
> </Seq-loc_int>
> </Seq-loc>
> </Gene-commentary_seqs>
> Endbase
>
> Function --> <Prot-ref_name>
> <Prot-ref_name_E>U5 snRNP-specific protein, 200 kDa</Prot-ref_name_E>
> <Prot-ref_name_E>U5 snRNP-specific protein, 200 kDa (DEXH RNA helicase
> family)</Prot-ref_name_E>
> </Prot-ref_name>
>
> DBLink --> <Gene-ref_locus-tag>MGI:2444401</Gene-ref_locus-tag>
> <Gene-commentary_source>
> <Other-source>
> <Other-source_src>
> <Dbtag>
> <Dbtag_db>GO</Dbtag_db>
> <Dbtag_tag>
> <Object-id>
> <Object-id_id>5524</Object-id_id>
> </Object-id>
> </Dbtag_tag>
> </Dbtag>
> </Other-source_src>
> <Other-source_anchor>ATP binding</Other-source_anchor>
> <Other-source_post-text>evidence: ISS</Other-source_post-text>
> </Other-source>
> </Gene-commentary_source>
>
> Product-type --> <Entrezgene_type value="protein-coding">6</Entrezgene_type>
>
> gene-comment --> <Gene-ref_desc>activating signal cointegrator 1 complex subunit 3-like
> 1</Gene-ref_desc>
>
> synonym --> <Gene-ref_syn>
> <Gene-ref_syn_E>HELIC2</Gene-ref_syn_E>
> <Gene-ref_syn_E>KIAA0788</Gene-ref_syn_E>
> <Gene-ref_syn_E>U5-200KD</Gene-ref_syn_E>
> <Gene-ref_syn_E>U5-200-KD</Gene-ref_syn_E>
> <Gene-ref_syn_E>A330064G03Rik</Gene-ref_syn_E>
> </Gene-ref_syn>
>
> EC --> <Prot-ref_ec>
> <Prot-ref_ec_E>1.5.1.5</Prot-ref_ec_E>
> <Prot-ref_ec_E>3.5.4.9</Prot-ref_ec_E>
> </Prot-ref_ec>
>
> Chromosome: <SubSource>
> <SubSource_subtype value="chromosome">1</SubSource_subtype>
> <SubSource_name>6</SubSource_name>
> </SubSource>
>
> Some can happen more than once in a record.
>
>
> On Fri, 22 Apr 2005 02:41:46 -0400, William Park wrote:
>
>> Willem Ligtenberg <WLigtenberg at gmail.com> wrote:
>>> On Sun, 17 Apr 2005 02:16:04 +0000, William Park wrote:
>>> > Care to post more details?
>>>
>>> The XML file I need to parse contains information about genes.
>>> So the first element is a gene and then there are a lot sub-elements with
>>> sub-elements. I only need some of the informtion and want to store it in
>>> my an object called gene. Lateron this information will be printed into a
>>> file, which in it's turn will be fed into some other program.
>>
>> You have to help us a little more here. Which info do you want to
>> extract from below example?
>>
>>> <Entrezgene-Set>
>>> ...
>>> </Entrezgene-Set>
More information about the Python-list
mailing list