XML parsing per record

Willem Ligtenberg WLigtenberg at gmail.com
Fri Apr 22 07:48:15 EDT 2005


This is all the info I need from the xml file:
ID --> 	<Gene-track_geneid>320632</Gene-track_geneid>

Name --> 	<Gene-ref>
        <Gene-ref_locus>Pzp</Gene-ref_locus>
        
Startbase --> <Gene-commentary_seqs>
        <Seq-loc>
          <Seq-loc_int>
            <Seq-interval>
              <Seq-interval_from>126957426</Seq-interval_from>
              <Seq-interval_to>126989473</Seq-interval_to>
              <Seq-interval_strand>
                <Na-strand value="plus"/>
              </Seq-interval_strand>
              <Seq-interval_id>
                <Seq-id>
                  <Seq-id_gi>51860766</Seq-id_gi>
                </Seq-id>
              </Seq-interval_id>
            </Seq-interval>
          </Seq-loc_int>
        </Seq-loc>
      </Gene-commentary_seqs>
Endbase

Function --> <Prot-ref_name>
        <Prot-ref_name_E>U5 snRNP-specific protein, 200 kDa</Prot-ref_name_E>
        <Prot-ref_name_E>U5 snRNP-specific protein, 200 kDa (DEXH RNA helicase
family)</Prot-ref_name_E>
      </Prot-ref_name>

DBLink --> <Gene-ref_locus-tag>MGI:2444401</Gene-ref_locus-tag>
<Gene-commentary_source>
                <Other-source>
                  <Other-source_src>
                    <Dbtag>
                      <Dbtag_db>GO</Dbtag_db>
                      <Dbtag_tag>
                        <Object-id>
                          <Object-id_id>5524</Object-id_id>
                        </Object-id>
                      </Dbtag_tag>
                    </Dbtag>
                  </Other-source_src>
                  <Other-source_anchor>ATP binding</Other-source_anchor>
                  <Other-source_post-text>evidence: ISS</Other-source_post-text>
                </Other-source>
              </Gene-commentary_source>

Product-type --> <Entrezgene_type value="protein-coding">6</Entrezgene_type>

gene-comment --> <Gene-ref_desc>activating signal cointegrator 1 complex subunit 3-like
1</Gene-ref_desc>

synonym --> <Gene-ref_syn>
        <Gene-ref_syn_E>HELIC2</Gene-ref_syn_E>
        <Gene-ref_syn_E>KIAA0788</Gene-ref_syn_E>
        <Gene-ref_syn_E>U5-200KD</Gene-ref_syn_E>
        <Gene-ref_syn_E>U5-200-KD</Gene-ref_syn_E>
        <Gene-ref_syn_E>A330064G03Rik</Gene-ref_syn_E>
      </Gene-ref_syn>
      
EC --> <Prot-ref_ec>
        <Prot-ref_ec_E>1.5.1.5</Prot-ref_ec_E>
        <Prot-ref_ec_E>3.5.4.9</Prot-ref_ec_E>
      </Prot-ref_ec>

Chromosome: <SubSource>
            <SubSource_subtype value="chromosome">1</SubSource_subtype>
            <SubSource_name>6</SubSource_name>
          </SubSource>

Some can happen more than once in a record.


On Fri, 22 Apr 2005 02:41:46 -0400, William Park wrote:

> Willem Ligtenberg <WLigtenberg at gmail.com> wrote:
>> On Sun, 17 Apr 2005 02:16:04 +0000, William Park wrote:
>> > Care to post more details?
>> 
>> The XML file I need to parse contains information about genes.
>> So the first element is a gene and then there are a lot sub-elements with
>> sub-elements. I only need some of the informtion and want to store it in
>> my an object called gene. Lateron this information will be printed into a
>> file, which in it's turn will be fed into some other program.
> 
> You have to help us a little more here.  Which info do you want to
> extract from below example?
> 
>> <Entrezgene-Set>
>> ...
>> </Entrezgene-Set>




More information about the Python-list mailing list