[Tutor] module to parse XMLish text?

Karim karim.liateni at free.fr
Fri Jan 14 09:27:49 CET 2011


Hello,

*from xml.etree.ElementTree import ElementTree

_/#Parsing:/_
doc = ElementTree()
doc.parse(xmlFile)
*
/_*#Find tag element:*_/
*doc.find('mytag')*

*_/#iteration over tag element:/_
lname = []
for lib in doc.iter('LibTag'):
      libName = lib.attrib['name']
      lname.append(libName)
*
Regards
Karim

On 01/14/2011 03:55 AM, Terry Carroll wrote:
> Does anyone know of a module that can parse out text with XML-like 
> tags as in the example below?  I emphasize the "-like" in "XML-like".  
> I don't think I can parse this as XML (can I?).
>
> Sample text between the dashed lines::
>
> ---------------------------------
> Blah, blah, blah
> <AAA>
> <BING ZEBRA>
> <BANG ROOSTER>
> <BOOM GARBONZO BEAN>
> <BLIP>SOMETHING ELSE</BLIP>
> <BASH>SOMETHING DIFFERENT</BASH>
> </AAA>
> ---------------------------------
>
> I'd like to be able to have a dictionary (or any other structure, 
> really; as long as I can get to the parsed-out pieces) that would look 
> smoothing like:
>
>  {"BING" : "ZEBRA",
>   "BANG" : "ROOSTER"
>   "BOOM" : "GARBONZO BEAN"
>   "BLIP" : "SOMETHING ELSE"
>   "BASH" : "SOMETHING DIFFERENT"}
>
> The "Blah, blah, blah" can be tossed away, for all I care.
>
> The basic rule is that the tag either has an operand (e.g., <BING 
> ZEBRA>), in which case the name is the first word and the content is 
> everything else that follows in the tag; or else the tag has no 
> operand, in which case it is matched to a corresponding closing tag 
> (e.g., <BLIP>SOMETHING ELSE</BLIP>), and the content is the material 
> between the two tags.
>
> I think I can assume there are no nested tags.
>
> I could write a state machine to do this, I suppose, but life's short, 
> and I'd rather not re-invent the wheel, if there's a wheel laying 
> around somewhere.
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110114/df43462c/attachment.html>


More information about the Tutor mailing list