XML parsing per record

Willem Ligtenberg WLigtenberg at gmail.com
Fri Apr 22 09:22:17 EDT 2005


As I'm trying to write the code using cElementTree.
I stumble across one problem. Sometimes there are multiple values to
retrieve from one record for the same element. Like this:
<Prot-ref_name_E>ATP-binding cassette, subfamily G, member 1</Prot-ref_name_E>
<Prot-ref_name_E>ATP-binding cassette 8</Prot-ref_name_E>

How do you get not only the first, but the rest as well, so that I can
store it in a list.

Thanks in advance,

Willem Ligtenberg

On Fri, 22 Apr 2005 13:48:15 +0200, Willem Ligtenberg wrote:

> This is all the info I need from the xml file:
> ID --> 	<Gene-track_geneid>320632</Gene-track_geneid>
> 
> Name --> 	<Gene-ref>
>         <Gene-ref_locus>Pzp</Gene-ref_locus>
>         
> Startbase --> <Gene-commentary_seqs>
>         <Seq-loc>
>           <Seq-loc_int>
>             <Seq-interval>
>               <Seq-interval_from>126957426</Seq-interval_from>
>               <Seq-interval_to>126989473</Seq-interval_to>
>               <Seq-interval_strand>
>                 <Na-strand value="plus"/>
>               </Seq-interval_strand>
>               <Seq-interval_id>
>                 <Seq-id>
>                   <Seq-id_gi>51860766</Seq-id_gi>
>                 </Seq-id>
>               </Seq-interval_id>
>             </Seq-interval>
>           </Seq-loc_int>
>         </Seq-loc>
>       </Gene-commentary_seqs>
> Endbase
> 
> Function --> <Prot-ref_name>
>         <Prot-ref_name_E>U5 snRNP-specific protein, 200 kDa</Prot-ref_name_E>
>         <Prot-ref_name_E>U5 snRNP-specific protein, 200 kDa (DEXH RNA helicase
> family)</Prot-ref_name_E>
>       </Prot-ref_name>
> 
> DBLink --> <Gene-ref_locus-tag>MGI:2444401</Gene-ref_locus-tag>
> <Gene-commentary_source>
>                 <Other-source>
>                   <Other-source_src>
>                     <Dbtag>
>                       <Dbtag_db>GO</Dbtag_db>
>                       <Dbtag_tag>
>                         <Object-id>
>                           <Object-id_id>5524</Object-id_id>
>                         </Object-id>
>                       </Dbtag_tag>
>                     </Dbtag>
>                   </Other-source_src>
>                   <Other-source_anchor>ATP binding</Other-source_anchor>
>                   <Other-source_post-text>evidence: ISS</Other-source_post-text>
>                 </Other-source>
>               </Gene-commentary_source>
> 
> Product-type --> <Entrezgene_type value="protein-coding">6</Entrezgene_type>
> 
> gene-comment --> <Gene-ref_desc>activating signal cointegrator 1 complex subunit 3-like
> 1</Gene-ref_desc>
> 
> synonym --> <Gene-ref_syn>
>         <Gene-ref_syn_E>HELIC2</Gene-ref_syn_E>
>         <Gene-ref_syn_E>KIAA0788</Gene-ref_syn_E>
>         <Gene-ref_syn_E>U5-200KD</Gene-ref_syn_E>
>         <Gene-ref_syn_E>U5-200-KD</Gene-ref_syn_E>
>         <Gene-ref_syn_E>A330064G03Rik</Gene-ref_syn_E>
>       </Gene-ref_syn>
>       
> EC --> <Prot-ref_ec>
>         <Prot-ref_ec_E>1.5.1.5</Prot-ref_ec_E>
>         <Prot-ref_ec_E>3.5.4.9</Prot-ref_ec_E>
>       </Prot-ref_ec>
> 
> Chromosome: <SubSource>
>             <SubSource_subtype value="chromosome">1</SubSource_subtype>
>             <SubSource_name>6</SubSource_name>
>           </SubSource>
> 
> Some can happen more than once in a record.
> 
> 
> On Fri, 22 Apr 2005 02:41:46 -0400, William Park wrote:
> 
>> Willem Ligtenberg <WLigtenberg at gmail.com> wrote:
>>> On Sun, 17 Apr 2005 02:16:04 +0000, William Park wrote:
>>> > Care to post more details?
>>> 
>>> The XML file I need to parse contains information about genes.
>>> So the first element is a gene and then there are a lot sub-elements with
>>> sub-elements. I only need some of the informtion and want to store it in
>>> my an object called gene. Lateron this information will be printed into a
>>> file, which in it's turn will be fed into some other program.
>> 
>> You have to help us a little more here.  Which info do you want to
>> extract from below example?
>> 
>>> <Entrezgene-Set>
>>> ...
>>> </Entrezgene-Set>




More information about the Python-list mailing list