identifying and parsing string in text file

Nemesis nemesis at nowhere.invalid
Sat Mar 8 15:02:35 EST 2008


Bryan.Fodness at gmail.com wrote:

> I have a large file that has many lines like this,
>
> <element tag="300a,0014" vr="CS" vm="1" len="4"
> name="DoseReferenceStructureType">SITE</element>
>
> I would like to identify the line by the tag (300a,0014) and then grab
> the name (DoseReferenceStructureType) and value (SITE).
>
> I would like to create a file that would have the structure,
>
>      DoseReferenceStructureType = Site
>      ...
>      ...

You should try with Regular Expressions or if it is something like xml there
is for sure a library you can you to parse it ...
anyway you can try something simpler like this:

       elem_dic=dict()
       for line in open(str(sys.argv[1])):
           line_splitted=line.split()
           for item in line_splitted:
               item_splitted=item.split("=")
               if len(item_splitted)>1:
                   elem_dic[item_splitted[0]]=item_splitted[1]

... then you have to retrieve from the dict the items you need, for example,
with the line you posted you obtain these items splitted:

['<element']
['tag', '"300a,0014"']
['vr', '"CS"']
['vm', '"1"']
['len', '"4"']
['name', '"DoseReferenceStructureType">SITE</element>']

and elem_dic will contain the last five, with the keys
'tag','vr','vm','len','name' and teh values 300a,0014 etc etc
i.e. this:

{'vr': '"CS"', 'tag': '"300a,0014"', 'vm': '"1"', 'len': '"4"', 'name': '"DoseReferenceStructureType">SITE</element>'}




-- 
Age is not a particularly interesting subject. Anyone can get old. All
you have to do is live long enough.




More information about the Python-list mailing list