identifying and parsing string in text file

Bernard bernard.chhun at gmail.com
Sat Mar 8 15:01:37 EST 2008


Hey Brian,

It seems the text you are trying to parse is similar to XML/HTML.
So I'd use BeautifulSoup[1] if I were you :)

here's a sample code for your scraping case:

from BeautifulSoup import BeautifulSoup

<python>

# assume the s variable has your text
s = "whatever xml or html here"
# turn it into a tasty & parsable soup :)
soup = BeautifulSoup(s)
# for every element tag in the soup
for el in soup.findAll("element"):
 # print out its tag & name attribute plus its inner value!
 print el["tag"], el["name"], el.string

</python>

that's it!

[1] http://www.crummy.com/software/BeautifulSoup/

On 8 mar, 14:49, "Bryan.Fodn... at gmail.com" <Bryan.Fodn... at gmail.com>
wrote:
> I have a large file that has many lines like this,
>
> <element tag="300a,0014" vr="CS" vm="1" len="4"
> name="DoseReferenceStructureType">SITE</element>
>
> I would like to identify the line by the tag (300a,0014) and then grab
> the name (DoseReferenceStructureType) and value (SITE).
>
> I would like to create a file that would have the structure,
>
>      DoseReferenceStructureType = Site
>      ...
>      ...
>
> Also, there is a possibility that there are multiple lines with the
> same tag, but different values.  These all need to be recorded.
>
> So far, I have a little bit of code to look at everything that is
> available,
>
>      for line in open(str(sys.argv[1])):
>           i_line = line.split()
>           if i_line:
>                if i_line[0] == "<element":
>                     a = i_line[1]
>                     b = i_line[5]
>                     print "%s     |     %s" %(a, b)
>
> but do not see a clever way of doing what I would like.
>
> Any help or guidance would be appreciated.
>
> Bryan




More information about the Python-list mailing list