finding the byte offset of an element in an XML file (tell() and seek()?)

Ben Temperton btemperton at gmail.com
Thu Jun 14 15:27:38 EDT 2012


Hi there,

I am working with mass spectroscopy data in the mzXML format that looks like this:
<mzXML>
    <msRun>
      <scan num="1">...</scan>
      <scan num="2">...</scan>
      <scan num="3">...</scan>
      <scan num="4">...</scan>
     .....
    </msRun>
    <index>
        <offset id="1">160409990</offset>
        <offset id="2">160442725</offset>
        <offset id="3">160474927</offset>
        <offset id="4">160497386</offset>
        ....
    </index>
</mzXML>

Where the offset element contains the byte offset of the scan element that shares the id. I am trying to write a python script to remove scan elements and their respective offset, but I can't figure out how I re-calculate the byte offset for each remaining element once the elements have been removed.

My plan was to write the file out, the read it back in again and search through the file for a particular string (e.g. '<scan num="1">') and then use the tell() method to return the current byte location in the file. However, I'm not sure how I would implement this.

Any ideas?

Many thanks,

Ben



More information about the Python-list mailing list