From flahertyk1 at hotmail.com Thu Feb 4 05:04:47 2010 From: flahertyk1 at hotmail.com (kimmyaf) Date: Wed, 3 Feb 2010 20:04:47 -0800 (PST) Subject: [XML-SIG] parsing XML with minidom Message-ID: <27447458.post@talk.nabble.com> Hello, I am not real sure if my question belongs here or not, but this is best place I could find. I am a python beginner and trying to teach myself how to parse some XML with minidom. This is the code excerpt I am struggling with.... ******************************************************** dom = minidom.parseString(xml_response) handler.close() route_list = [] tag = ['route'] tmp_route=[] for route in dom.getElementsByTagName('body'): print 'in' tmp_route[route] = dom.getElementsByTagName(tag)[0].getAttribute('tag') route_list.append(tmp_route) ******************************************************************* Here is the XML I am getting back when I call... ' \r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n' See this formatted better by pasting this URL = > http://webservices.nextbus.com/service/publicXMLFeed?command=routeList&a=mbta I am taking the following error: File "C:/Users/Kim/Grad School/Python/bus python.py", line 54, in get_available_routes() File "C:/Users/Kim/Grad School/Python/bus python.py", line 43, in get_available_routes tmp_route[route] = dom.getElementsByTagName(tag)[0].getAttribute('tag') IndexError: list index out of range I'm sure there is something obvious that I am doing wrong. All I want to do is grab all of the values and put them into a list. Kind of new to parsing XML! I'm working off an example but the XML in the example code is a lot more in depth so can't really relate it to mine. I also would like any reference anyone has about how to parse with minidom!! Help! Thank you! %-| -- View this message in context: http://old.nabble.com/parsing-XML-with-minidom-tp27447458p27447458.html Sent from the Python - xml-sig mailing list archive at Nabble.com. From rajanikanth at gmail.com Thu Feb 4 07:11:25 2010 From: rajanikanth at gmail.com (Rajanikanth Jammalamadaka) Date: Wed, 3 Feb 2010 22:11:25 -0800 Subject: [XML-SIG] parsing XML with minidom In-Reply-To: <27447458.post@talk.nabble.com> References: <27447458.post@talk.nabble.com> Message-ID: <84bdef3c1002032211l23fe60bi4681dca06f18bc04@mail.gmail.com> Try this: from xml.etree.ElementTree import ElementTree doc = ElementTree(file = "t.xml") listOfTags = [] for item in doc.findall(".//route"): listOfTags.append(item.get('tag')) print listOfTags where t.xml is your xml file. Thanks, Raj On Wed, Feb 3, 2010 at 8:04 PM, kimmyaf wrote: > > Hello, I am not real sure if my question belongs here or not, but this is > best place I could find. > > I am a python beginner and trying to teach myself how to parse some XML > with > minidom. > > This is the code excerpt I am struggling with.... > > ******************************************************** > dom = minidom.parseString(xml_response) > handler.close() > > route_list = [] > tag = ['route'] > > tmp_route=[] > for route in dom.getElementsByTagName('body'): > print 'in' > tmp_route[route] = > dom.getElementsByTagName(tag)[0].getAttribute('tag') > route_list.append(tmp_route) > > ******************************************************************* > Here is the XML I am getting back when I call... > > ' \r\n\r\n\r\n title="111"/>\r\n\r\n title="116"/>\r\n\r\n\r\n' > > See this formatted better by pasting this URL = > > > > http://webservices.nextbus.com/service/publicXMLFeed?command=routeList&a=mbta > > > I am taking the following error: > > File "C:/Users/Kim/Grad School/Python/bus python.py", line 54, in > get_available_routes() > File "C:/Users/Kim/Grad School/Python/bus python.py", line 43, in > get_available_routes > tmp_route[route] = dom.getElementsByTagName(tag)[0].getAttribute('tag') > IndexError: list index out of range > > > > I'm sure there is something obvious that I am doing wrong. All I want to do > is grab all of the values and put them into a list. Kind of > new > to parsing XML! I'm working off an example but the XML in the example code > is a lot more in depth so can't really relate it to mine. I also would like > any reference anyone has about how to parse with minidom!! > > Help! Thank you! %-| > -- > View this message in context: > http://old.nabble.com/parsing-XML-with-minidom-tp27447458p27447458.html > Sent from the Python - xml-sig mailing list archive at Nabble.com. > > _______________________________________________ > XML-SIG maillist - XML-SIG at python.org > http://mail.python.org/mailman/listinfo/xml-sig > -- Rajanikanth -------------- next part -------------- An HTML attachment was scrubbed... URL: From bigotp at acm.org Thu Feb 4 13:43:16 2010 From: bigotp at acm.org (Peter A. Bigot) Date: Thu, 04 Feb 2010 05:43:16 -0700 Subject: [XML-SIG] parsing XML with minidom In-Reply-To: <27447458.post@talk.nabble.com> References: <27447458.post@talk.nabble.com> Message-ID: <4B6AC0E4.2010601@acm.org> The variable tag is a list of strings. The method getElementsByTagName takes a single string as its first parameter. Since a list cannot appear as a tag name, the second call to getElementsByTagName returns an empty list. body = dom.getElementsByTagName('body')[0] for route in body.getElementsByTagName('route'): print route.getAttribute('tag') Peter On 2/3/2010 9:04 PM, kimmyaf wrote: > Hello, I am not real sure if my question belongs here or not, but this is > best place I could find. > > I am a python beginner and trying to teach myself how to parse some XML with > minidom. > > This is the code excerpt I am struggling with.... > > ******************************************************** > dom = minidom.parseString(xml_response) > handler.close() > > route_list = [] > tag = ['route'] > > tmp_route=[] > for route in dom.getElementsByTagName('body'): > print 'in' > tmp_route[route] = > dom.getElementsByTagName(tag)[0].getAttribute('tag') > route_list.append(tmp_route) > > ******************************************************************* > Here is the XML I am getting back when I call... > > ' \r\n\r\n\r\n title="111"/>\r\n\r\n title="116"/>\r\n\r\n\r\n' > > See this formatted better by pasting this URL => > > http://webservices.nextbus.com/service/publicXMLFeed?command=routeList&a=mbta > > > I am taking the following error: > > File "C:/Users/Kim/Grad School/Python/bus python.py", line 54, in > get_available_routes() > File "C:/Users/Kim/Grad School/Python/bus python.py", line 43, in > get_available_routes > tmp_route[route] = dom.getElementsByTagName(tag)[0].getAttribute('tag') > IndexError: list index out of range > > > > I'm sure there is something obvious that I am doing wrong. All I want to do > is grab all of the values and put them into a list. Kind of new > to parsing XML! I'm working off an example but the XML in the example code > is a lot more in depth so can't really relate it to mine. I also would like > any reference anyone has about how to parse with minidom!! > > Help! Thank you! %-| > From james.johnston at lifeway.com Thu Feb 18 21:42:10 2010 From: james.johnston at lifeway.com (James Johnston) Date: Thu, 18 Feb 2010 14:42:10 -0600 Subject: [XML-SIG] Python working with XML Message-ID: <710c80bb1002181242g5c0df17aid5710a1574761871@mail.gmail.com> I want to develop some tools for working with XML document files. This might require the validation of XML rules and converting various formats into XML. What is new and available? Some of the things I am reading are from 2000 - 2001. Thanks -- James Johnston Retail Technologies (615) 251-2792 james.johnston at lifeway.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From nimmyliji at gmail.com Tue Feb 9 10:18:05 2010 From: nimmyliji at gmail.com (nimmyliji) Date: Tue, 9 Feb 2010 01:18:05 -0800 (PST) Subject: [XML-SIG] Send an xml file Message-ID: <27512403.post@talk.nabble.com> Hi, How can i send an xml file from python to flex 3? Any one can help me.... With example codes.... Thanks in advance nimyliji -- View this message in context: http://old.nabble.com/Send-an-xml-file-tp27512403p27512403.html Sent from the Python - xml-sig mailing list archive at Nabble.com. From vnocciolini at mbigroup.it Wed Feb 17 11:47:40 2010 From: vnocciolini at mbigroup.it (Vinicio Nocciolini) Date: Wed, 17 Feb 2010 11:47:40 +0100 Subject: [XML-SIG] PyXML-0.8.4 error Message-ID: <4B7BC94C.9020608@mbigroup.it> Hi I am using Ubuntu 9.10 This is the error putput regards Vinicio PyXML-0.8.4$ python2.5 setup.py build running build running build_py running build_ext building '_xmlplus.parsers.pyexpat' extension gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DXML_NS=1 -DXML_DTD=1 -DBYTEORDER=1234 -DXML_CONTEXT_BYTES=1024 -DHAVE_MEMMOVE=1 -Iextensions/expat/lib -I/usr/include/python2.5 -c extensions/pyexpat.c -o build/temp.linux-i686-2.5/extensions/pyexpat.o extensions/pyexpat.c:5:20: error: Python.h: No such file or directory extensions/pyexpat.c:8:21: error: compile.h: No such file or directory extensions/pyexpat.c:9:25: error: frameobject.h: No such file or directory extensions/pyexpat.c:63: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:70: error: expected specifier-qualifier-list before ?PyObject_HEAD? extensions/pyexpat.c:89: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?Xmlparsetype? extensions/pyexpat.c:98: error: expected specifier-qualifier-list before ?PyCodeObject? extensions/pyexpat.c:108: error: expected ?)? before ?*? token extensions/pyexpat.c:123: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c: In function ?have_handler?: extensions/pyexpat.c:150: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:150: error: (Each undeclared identifier is reported only once extensions/pyexpat.c:150: error: for each function it appears in.) extensions/pyexpat.c:150: error: ?handler? undeclared (first use in this function) extensions/pyexpat.c:150: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c: At top level: extensions/pyexpat.c:154: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:201: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:214: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c: In function ?flag_error?: extensions/pyexpat.c:248: error: ?xmlparseobject? has no member named ?itself? extensions/pyexpat.c: At top level: extensions/pyexpat.c:252: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:305: error: expected ?)? before ?*? token extensions/pyexpat.c:332: error: expected ?)? before ?*? token extensions/pyexpat.c:367: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:419: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c: In function ?call_character_handler?: extensions/pyexpat.c:444: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:444: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:445: error: ?temp? undeclared (first use in this function) extensions/pyexpat.c:447: warning: implicit declaration of function ?PyTuple_New? extensions/pyexpat.c:455: warning: implicit declaration of function ?conv_string_len_to_utf8? extensions/pyexpat.c:458: warning: implicit declaration of function ?Py_DECREF? extensions/pyexpat.c:462: warning: implicit declaration of function ?PyTuple_SET_ITEM? extensions/pyexpat.c:464: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:465: warning: implicit declaration of function ?call_with_frame? extensions/pyexpat.c:465: warning: implicit declaration of function ?getcode? extensions/pyexpat.c:466: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:468: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?flush_character_buffer?: extensions/pyexpat.c:482: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:482: error: ?xmlparseobject? has no member named ?buffer_used? extensions/pyexpat.c:484: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:484: error: ?xmlparseobject? has no member named ?buffer_used? extensions/pyexpat.c:485: error: ?xmlparseobject? has no member named ?buffer_used? extensions/pyexpat.c: In function ?my_CharacterDataHandler?: extensions/pyexpat.c:493: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:496: error: ?xmlparseobject? has no member named ?buffer_used? extensions/pyexpat.c:496: error: ?xmlparseobject? has no member named ?buffer_size? extensions/pyexpat.c:505: error: ?xmlparseobject? has no member named ?buffer_size? extensions/pyexpat.c:507: error: ?xmlparseobject? has no member named ?buffer_used? extensions/pyexpat.c:510: warning: implicit declaration of function ?memcpy? extensions/pyexpat.c:510: warning: incompatible implicit declaration of built-in function ?memcpy? extensions/pyexpat.c:510: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:510: error: ?xmlparseobject? has no member named ?buffer_used? extensions/pyexpat.c:512: error: ?xmlparseobject? has no member named ?buffer_used? extensions/pyexpat.c: In function ?my_StartElementHandler?: extensions/pyexpat.c:524: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:524: error: ?container? undeclared (first use in this function) extensions/pyexpat.c:524: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:524: warning: left-hand operand of comma expression has no effect extensions/pyexpat.c:524: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:524: warning: left-hand operand of comma expression has no effect extensions/pyexpat.c:532: error: ?xmlparseobject? has no member named ?specified_attributes? extensions/pyexpat.c:533: error: ?xmlparseobject? has no member named ?itself? extensions/pyexpat.c:541: error: ?xmlparseobject? has no member named ?ordered_attributes? extensions/pyexpat.c:542: warning: implicit declaration of function ?PyList_New? extensions/pyexpat.c:544: warning: implicit declaration of function ?PyDict_New? extensions/pyexpat.c:550: error: ?n? undeclared (first use in this function) extensions/pyexpat.c:550: warning: implicit declaration of function ?string_intern? extensions/pyexpat.c:551: error: ?v? undeclared (first use in this function) extensions/pyexpat.c:557: warning: implicit declaration of function ?conv_string_to_utf8? extensions/pyexpat.c:564: error: ?xmlparseobject? has no member named ?ordered_attributes? extensions/pyexpat.c:565: warning: implicit declaration of function ?PyList_SET_ITEM? extensions/pyexpat.c:568: warning: implicit declaration of function ?PyDict_SetItem? extensions/pyexpat.c:579: warning: implicit declaration of function ?Py_BuildValue? extensions/pyexpat.c:585: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:587: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:588: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_EndElementHandler?: extensions/pyexpat.c:636: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:636: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:636: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:636: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:636: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:636: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_ProcessingInstructionHandler?: extensions/pyexpat.c:640: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:640: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:640: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:640: error: ?conv_string_to_utf8? undeclared (first use in this function) extensions/pyexpat.c:640: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:640: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:640: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_UnparsedEntityDeclHandler?: extensions/pyexpat.c:646: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:646: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:646: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:646: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:646: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:646: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_EntityDeclHandler?: extensions/pyexpat.c:659: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:659: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:659: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:659: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:659: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:659: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_XmlDeclHandler?: extensions/pyexpat.c:696: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:696: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:696: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:696: error: ?conv_string_to_utf8? undeclared (first use in this function) extensions/pyexpat.c:696: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:696: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:696: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: At top level: extensions/pyexpat.c:705: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c: In function ?my_ElementDeclHandler?: extensions/pyexpat.c:737: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:737: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:740: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:741: error: ?modelobj? undeclared (first use in this function) extensions/pyexpat.c:741: error: ?nameobj? undeclared (first use in this function) extensions/pyexpat.c:741: warning: left-hand operand of comma expression has no effect extensions/pyexpat.c:751: warning: implicit declaration of function ?conv_content_model? extensions/pyexpat.c:751: error: ?conv_string_to_utf8? undeclared (first use in this function) extensions/pyexpat.c:769: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:771: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:772: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:780: warning: implicit declaration of function ?Py_XDECREF? extensions/pyexpat.c:781: error: ?xmlparseobject? has no member named ?itself? extensions/pyexpat.c: In function ?my_AttlistDeclHandler?: extensions/pyexpat.c:785: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:785: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:785: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:785: error: ?conv_string_to_utf8? undeclared (first use in this function) extensions/pyexpat.c:785: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:785: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:785: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_SkippedEntityHandler?: extensions/pyexpat.c:798: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:798: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:798: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:798: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:798: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:798: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_NotationDeclHandler?: extensions/pyexpat.c:806: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:806: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:806: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:806: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:806: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:806: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_StartNamespaceDeclHandler?: extensions/pyexpat.c:816: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:816: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:816: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:816: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:816: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:816: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_EndNamespaceDeclHandler?: extensions/pyexpat.c:823: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:823: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:823: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:823: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:823: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:823: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_CommentHandler?: extensions/pyexpat.c:828: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:828: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:828: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:828: error: ?conv_string_to_utf8? undeclared (first use in this function) extensions/pyexpat.c:828: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:828: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:828: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_StartCdataSectionHandler?: extensions/pyexpat.c:832: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:832: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:832: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:832: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:832: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:832: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_EndCdataSectionHandler?: extensions/pyexpat.c:836: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:836: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:836: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:836: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:836: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:836: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_DefaultHandler?: extensions/pyexpat.c:841: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:841: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:841: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:841: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:841: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:841: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_DefaultHandlerExpandHandler?: extensions/pyexpat.c:845: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:845: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:845: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:845: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:845: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:845: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_NotStandaloneHandler?: extensions/pyexpat.c:862: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:862: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:862: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:862: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:862: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:862: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:862: warning: implicit declaration of function ?PyInt_AsLong? extensions/pyexpat.c: In function ?my_ExternalEntityRefHandler?: extensions/pyexpat.c:866: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:866: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:866: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:866: error: ?conv_string_to_utf8? undeclared (first use in this function) extensions/pyexpat.c:866: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:866: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:866: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_StartDoctypeDeclHandler?: extensions/pyexpat.c:881: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:881: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:881: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:881: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:881: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:881: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: In function ?my_EndDoctypeDeclHandler?: extensions/pyexpat.c:889: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:889: error: ?args? undeclared (first use in this function) extensions/pyexpat.c:889: error: ?rv? undeclared (first use in this function) extensions/pyexpat.c:889: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c:889: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:889: error: ?xmlparseobject? has no member named ?in_callback? extensions/pyexpat.c: At top level: extensions/pyexpat.c:893: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:912: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:930: error: expected declaration specifiers or ?...? before ?PyObject? extensions/pyexpat.c: In function ?readinst?: extensions/pyexpat.c:932: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:932: error: ?arg? undeclared (first use in this function) extensions/pyexpat.c:933: error: ?bytes? undeclared (first use in this function) extensions/pyexpat.c:934: error: ?str? undeclared (first use in this function) extensions/pyexpat.c:937: warning: implicit declaration of function ?PyInt_FromLong? extensions/pyexpat.c:948: warning: implicit declaration of function ?PyObject_CallObject? extensions/pyexpat.c:948: error: ?meth? undeclared (first use in this function) extensions/pyexpat.c:956: warning: implicit declaration of function ?PyString_Check? extensions/pyexpat.c:957: warning: implicit declaration of function ?PyErr_Format? extensions/pyexpat.c:957: error: ?PyExc_TypeError? undeclared (first use in this function) extensions/pyexpat.c:962: warning: implicit declaration of function ?PyString_GET_SIZE? extensions/pyexpat.c:964: error: ?PyExc_ValueError? undeclared (first use in this function) extensions/pyexpat.c:970: warning: incompatible implicit declaration of built-in function ?memcpy? extensions/pyexpat.c:970: warning: implicit declaration of function ?PyString_AsString? extensions/pyexpat.c:970: warning: passing argument 2 of ?memcpy? makes pointer from integer without a cast extensions/pyexpat.c:970: note: expected ?const void *? but argument is of type ?int? extensions/pyexpat.c: At top level: extensions/pyexpat.c:981: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1044: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1062: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1077: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1108: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1203: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1223: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1242: error: array type has incomplete element type extensions/pyexpat.c:1243: error: ?PyCFunction? undeclared here (not in a function) extensions/pyexpat.c:1243: error: expected ?}? before ?xmlparse_Parse? extensions/pyexpat.c:1245: error: expected ?}? before ?xmlparse_ParseFile? extensions/pyexpat.c:1247: error: expected ?}? before ?xmlparse_SetBase? extensions/pyexpat.c:1249: error: expected ?}? before ?xmlparse_GetBase? extensions/pyexpat.c:1251: error: expected ?}? before ?xmlparse_ExternalEntityParserCreate? extensions/pyexpat.c:1253: error: expected ?}? before ?xmlparse_SetParamEntityParsing? extensions/pyexpat.c:1255: error: expected ?}? before ?xmlparse_GetInputContext? extensions/pyexpat.c:1258: error: expected ?}? before ?xmlparse_UseForeignDTD? extensions/pyexpat.c:1320: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c: In function ?xmlparse_dealloc?: extensions/pyexpat.c:1395: warning: implicit declaration of function ?PyObject_GC_Fini? extensions/pyexpat.c:1397: error: ?xmlparseobject? has no member named ?itself? extensions/pyexpat.c:1398: error: ?xmlparseobject? has no member named ?itself? extensions/pyexpat.c:1399: error: ?xmlparseobject? has no member named ?itself? extensions/pyexpat.c:1401: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:1402: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:1402: error: ?temp? undeclared (first use in this function) extensions/pyexpat.c:1404: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:1405: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:1408: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:1409: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:1411: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:1412: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:1413: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:1415: error: ?xmlparseobject? has no member named ?intern? extensions/pyexpat.c:1418: warning: implicit declaration of function ?PyObject_Del? extensions/pyexpat.c: In function ?handlername2int?: extensions/pyexpat.c:1430: warning: implicit declaration of function ?strcmp? extensions/pyexpat.c: At top level: extensions/pyexpat.c:1437: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1445: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1549: error: expected declaration specifiers or ?...? before ?PyObject? extensions/pyexpat.c: In function ?sethandler?: extensions/pyexpat.c:1554: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:1554: error: ?temp? undeclared (first use in this function) extensions/pyexpat.c:1554: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:1556: error: ?v? undeclared (first use in this function) extensions/pyexpat.c:1556: error: ?Py_None? undeclared (first use in this function) extensions/pyexpat.c:1559: warning: implicit declaration of function ?Py_INCREF? extensions/pyexpat.c:1562: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:1564: error: ?xmlparseobject? has no member named ?itself? extensions/pyexpat.c: At top level: extensions/pyexpat.c:1571: error: expected declaration specifiers or ?...? before ?PyObject? extensions/pyexpat.c: In function ?xmlparse_setattr?: extensions/pyexpat.c:1574: error: ?v? undeclared (first use in this function) extensions/pyexpat.c:1575: warning: implicit declaration of function ?PyErr_SetString? extensions/pyexpat.c:1575: error: ?PyExc_RuntimeError? undeclared (first use in this function) extensions/pyexpat.c:1579: warning: implicit declaration of function ?PyObject_IsTrue? extensions/pyexpat.c:1580: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:1581: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:1581: error: ?xmlparseobject? has no member named ?buffer_size? extensions/pyexpat.c:1582: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:1583: warning: implicit declaration of function ?PyErr_NoMemory? extensions/pyexpat.c:1586: error: ?xmlparseobject? has no member named ?buffer_used? extensions/pyexpat.c:1589: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:1592: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:1593: error: ?xmlparseobject? has no member named ?buffer? extensions/pyexpat.c:1599: error: ?xmlparseobject? has no member named ?ns_prefixes? extensions/pyexpat.c:1601: error: ?xmlparseobject? has no member named ?ns_prefixes? extensions/pyexpat.c:1602: error: ?xmlparseobject? has no member named ?itself? extensions/pyexpat.c:1602: error: ?xmlparseobject? has no member named ?ns_prefixes? extensions/pyexpat.c:1607: error: ?xmlparseobject? has no member named ?ordered_attributes? extensions/pyexpat.c:1609: error: ?xmlparseobject? has no member named ?ordered_attributes? extensions/pyexpat.c:1615: error: ?PyExc_ValueError? undeclared (first use in this function) extensions/pyexpat.c:1623: error: ?xmlparseobject? has no member named ?returns_unicode? extensions/pyexpat.c:1628: error: ?xmlparseobject? has no member named ?specified_attributes? extensions/pyexpat.c:1630: error: ?xmlparseobject? has no member named ?specified_attributes? extensions/pyexpat.c:1642: error: too many arguments to function ?sethandler? extensions/pyexpat.c:1645: error: ?PyExc_AttributeError? undeclared (first use in this function) extensions/pyexpat.c: At top level: extensions/pyexpat.c:1676: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?Xmlparsetype? extensions/pyexpat.c:1719: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1766: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c:1778: error: array type has incomplete element type extensions/pyexpat.c:1779: error: expected ?}? before ?pyexpat_ParserCreate? extensions/pyexpat.c:1781: error: expected ?}? before ?pyexpat_ErrorString? extensions/pyexpat.c:1797: error: expected ?=?, ?,?, ?;?, ?asm? or ?__attribute__? before ?*? token extensions/pyexpat.c: In function ?initpyexpat?: extensions/pyexpat.c:1835: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:1835: error: ?m? undeclared (first use in this function) extensions/pyexpat.c:1835: error: ?d? undeclared (first use in this function) extensions/pyexpat.c:1835: warning: left-hand operand of comma expression has no effect extensions/pyexpat.c:1836: error: ?errmod_name? undeclared (first use in this function) extensions/pyexpat.c:1836: warning: implicit declaration of function ?PyString_FromString? extensions/pyexpat.c:1837: error: ?errors_module? undeclared (first use in this function) extensions/pyexpat.c:1838: error: ?modelmod_name? undeclared (first use in this function) extensions/pyexpat.c:1839: error: ?model_module? undeclared (first use in this function) extensions/pyexpat.c:1840: error: ?sys_modules? undeclared (first use in this function) extensions/pyexpat.c:1848: error: ?Xmlparsetype? undeclared (first use in this function) extensions/pyexpat.c:1848: error: ?PyType_Type? undeclared (first use in this function) extensions/pyexpat.c:1851: warning: implicit declaration of function ?Py_InitModule3? extensions/pyexpat.c:1855: error: ?ErrorObject? undeclared (first use in this function) extensions/pyexpat.c:1856: warning: implicit declaration of function ?PyErr_NewException? extensions/pyexpat.c:1862: warning: implicit declaration of function ?PyModule_AddObject? extensions/pyexpat.c:1866: error: expected expression before ?)? token extensions/pyexpat.c:1868: warning: implicit declaration of function ?get_version_string? extensions/pyexpat.c:1869: warning: implicit declaration of function ?PyModule_AddStringConstant? extensions/pyexpat.c:1889: warning: implicit declaration of function ?PySys_GetObject? extensions/pyexpat.c:1890: warning: implicit declaration of function ?PyModule_GetDict? extensions/pyexpat.c:1891: warning: implicit declaration of function ?PyDict_GetItem? extensions/pyexpat.c:1893: warning: implicit declaration of function ?PyModule_New? extensions/pyexpat.c:1918: error: ?list? undeclared (first use in this function) extensions/pyexpat.c:1921: warning: implicit declaration of function ?PyErr_Clear? extensions/pyexpat.c:1926: error: ?item? undeclared (first use in this function) extensions/pyexpat.c:1933: warning: implicit declaration of function ?PyList_Append? extensions/pyexpat.c:1996: warning: implicit declaration of function ?PyModule_AddIntConstant? extensions/pyexpat.c: In function ?clear_handlers?: extensions/pyexpat.c:2023: error: ?PyObject? undeclared (first use in this function) extensions/pyexpat.c:2023: error: ?temp? undeclared (first use in this function) extensions/pyexpat.c:2027: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:2029: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:2030: error: ?xmlparseobject? has no member named ?handlers? extensions/pyexpat.c:2032: error: ?xmlparseobject? has no member named ?itself? error: command 'gcc' failed with exit status 1 From dieter at handshake.de Sun Feb 21 07:30:45 2010 From: dieter at handshake.de (Dieter Maurer) Date: Sun, 21 Feb 2010 07:30:45 +0100 Subject: [XML-SIG] PyXML-0.8.4 error In-Reply-To: <4B7BC94C.9020608@mbigroup.it> References: <4B7BC94C.9020608@mbigroup.it> Message-ID: <19328.54037.34726.70853@gargle.gargle.HOWL> Vinicio Nocciolini wrote at 2010-2-17 11:47 +0100: >I am using Ubuntu 9.10 >This is the error putput >regards Vinicio > > >PyXML-0.8.4$ python2.5 setup.py build >running build >running build_py >running build_ext >building '_xmlplus.parsers.pyexpat' extension >gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall >-Wstrict-prototypes -fPIC -DXML_NS=1 -DXML_DTD=1 -DBYTEORDER=1234 >-DXML_CONTEXT_BYTES=1024 -DHAVE_MEMMOVE=1 -Iextensions/expat/lib >-I/usr/include/python2.5 -c extensions/pyexpat.c -o >build/temp.linux-i686-2.5/extensions/pyexpat.o >extensions/pyexpat.c:5:20: error: Python.h: No such file or directory You need to install the development package for Python (something like "python-dev") on your system. -- Dieter From stefan_ml at behnel.de Sun Feb 21 12:14:11 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 21 Feb 2010 12:14:11 +0100 Subject: [XML-SIG] Python working with XML In-Reply-To: <710c80bb1002181242g5c0df17aid5710a1574761871@mail.gmail.com> References: <710c80bb1002181242g5c0df17aid5710a1574761871@mail.gmail.com> Message-ID: <4B811583.4020604@behnel.de> James Johnston, 18.02.2010 21:42: > I want to develop some tools for working with XML document files. This > might require the validation of XML rules and converting various formats > into XML. What is new and available? Some of the things I am reading are > from 2000 - 2001. Data conversion tends to be rather easy in Python. If you want to output XML, there are multiple options, but you might want to start with the xml.etree Package in Python's standard library. If you need validation, use lxml instead. Stefan From kulthum91 at gmail.com Mon Feb 22 14:24:12 2010 From: kulthum91 at gmail.com (sharifah ummu kulthum) Date: Mon, 22 Feb 2010 21:24:12 +0800 Subject: [XML-SIG] HTML parse error Message-ID: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> Hi guys I am new to python. I have just installed python yesterday for my mythtv project. I found a site herefor getting channel listing grabber to get channel for Malaysia for my mythtv box. but I get these. I don't know what it means Any insight is very mush appreciated as I am very new to python. bitto at bitto:~$ python grabmy.py -f my.xml Traceback (most recent call last): File "grabmy.py", line 236, in main() File "grabmy.py", line 225, in main for elem in grabber.grab(date + timedelta(i), **params_dict): File "grabmy.py", line 102, in grab html = self.get_html(date, **kwargs) File "grabmy.py", line 63, in get_html return BeautifulSoup(content) File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1499, in __init__ File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1230, in __init__ File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1263, in _feed File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed self.goahead(0) File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead k = self.parse_starttag(i) File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag endpos = self.check_for_whole_start_tag(i) File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag self.error("malformed start tag") File "/usr/lib/python2.6/HTMLParser.py", line 115, in error raise HTMLParseError(message, self.getpos()) HTMLParser.HTMLParseError: malformed start tag, at line 830, column 36 Bitto -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Mon Feb 22 15:06:24 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 22 Feb 2010 15:06:24 +0100 Subject: [XML-SIG] HTML parse error In-Reply-To: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> References: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> Message-ID: <4B828F60.9070209@behnel.de> sharifah ummu kulthum, 22.02.2010 14:24: > I am new to python. I have just installed python yesterday for my mythtv > project. I found a site > herefor > getting channel listing grabber to get channel for Malaysia for my > mythtv box. but I get these. I don't know what it means > [...] > HTMLParser.HTMLParseError: malformed start tag, at line 830, column 36 It means that what you want to parse here is not valid HTML, i.e. the web page is broken. The HTMLParser package in the standard library is not made for parsing broken HTML. Use another tool like html5lib or lxml.html. Stefan From stefan_ml at behnel.de Mon Feb 22 15:12:51 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 22 Feb 2010 15:12:51 +0100 Subject: [XML-SIG] HTML parse error In-Reply-To: <437a31571002220608x7d7abfcj2dc3622b2ad8474d@mail.gmail.com> References: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> <4B828F60.9070209@behnel.de> <437a31571002220608x7d7abfcj2dc3622b2ad8474d@mail.gmail.com> Message-ID: <4B8290E3.2070406@behnel.de> sharifah ummu kulthum, 22.02.2010 15:08: > On Mon, Feb 22, 2010 at 10:06 PM, Stefan Behnel wrote: > >> sharifah ummu kulthum, 22.02.2010 14:24: >>> I am new to python. I have just installed python yesterday for my mythtv >>> project. I found a site >>> here< >> https://sayap.com/blog/2008/12/30/mythtv-s-xmltv-grabber-for-malaysia-channels >>> for >>> getting channel listing grabber to get channel for Malaysia for my >>> mythtv box. but I get these. I don't know what it means >>> [...] >>> HTMLParser.HTMLParseError: malformed start tag, at line 830, column 36 >> It means that what you want to parse here is not valid HTML, i.e. the web >> page is broken. The HTMLParser package in the standard library is not made >> for parsing broken HTML. Use another tool like html5lib or lxml.html. >> >> Stefan >> > does it means that i have to install the tool? Yes. That's pretty easy, though. They should be readily packaged for your platform (Linux), so you can just install them like any other software package. Look out for "python-html5lib" or "python-lxml". Stefan From stefan_ml at behnel.de Mon Feb 22 15:46:27 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 22 Feb 2010 15:46:27 +0100 Subject: [XML-SIG] HTML parse error In-Reply-To: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> References: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> Message-ID: <4B8298C3.5040701@behnel.de> sharifah ummu kulthum, 22.02.2010 14:24: > File "grabmy.py", line 63, in get_html > return BeautifulSoup(content) > File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1499, in __init__ > File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1230, in __init__ > File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1263, in _feed > File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed > self.goahead(0) > File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead > k = self.parse_starttag(i) > File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag > endpos = self.check_for_whole_start_tag(i) > File "/usr/lib/python2.6/HTMLParser.py", line 301, in > check_for_whole_start_tag > self.error("malformed start tag") > File "/usr/lib/python2.6/HTMLParser.py", line 115, in error > raise HTMLParseError(message, self.getpos()) > HTMLParser.HTMLParseError: malformed start tag, at line 830, column 36 Just noticed this now - you seem to be using BeautifulSoup, likely version 3.1. This version does not support parsing broken HTML any well, so use version 3.0.8 instead, or switch to the tools I indicated. Note that switching tools means that you need to change your code to use them. Just installing them is not enough. Stefan From kulthum91 at gmail.com Tue Feb 23 04:45:36 2010 From: kulthum91 at gmail.com (sharifah ummu kulthum) Date: Tue, 23 Feb 2010 11:45:36 +0800 Subject: [XML-SIG] HTML parse error In-Reply-To: <4B8298C3.5040701@behnel.de> References: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> <4B8298C3.5040701@behnel.de> Message-ID: <437a31571002221945h1c0079d5i33d641b98f0fabfa@mail.gmail.com> On Mon, Feb 22, 2010 at 10:46 PM, Stefan Behnel wrote: > sharifah ummu kulthum, 22.02.2010 14:24: > > File "grabmy.py", line 63, in get_html > > return BeautifulSoup(content) > > File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1499, in > __init__ > > File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1230, in > __init__ > > File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1263, in _feed > > File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed > > self.goahead(0) > > File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead > > k = self.parse_starttag(i) > > File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag > > endpos = self.check_for_whole_start_tag(i) > > File "/usr/lib/python2.6/HTMLParser.py", line 301, in > > check_for_whole_start_tag > > self.error("malformed start tag") > > File "/usr/lib/python2.6/HTMLParser.py", line 115, in error > > raise HTMLParseError(message, self.getpos()) > > HTMLParser.HTMLParseError: malformed start tag, at line 830, column 36 > > Just noticed this now - you seem to be using BeautifulSoup, likely version > 3.1. This version does not support parsing broken HTML any well, so use > version 3.0.8 instead, or switch to the tools I indicated. > > Note that switching tools means that you need to change your code to use > them. Just installing them is not enough. > > Stefan > > I am so sorry but I really don't know how to change the code as I have just learn python. How am I going to switch the version or to change the code? Because I don't really understand the code. Here is the code: ''' Copyright (c) 2008 Yap Sok Ann This module contains xmltv grabbers for Malaysia channels. ''' __author__ = 'Yap Sok Ann ' __license__ = 'PSF License' import logging from datetime import date as dt from datetime import datetime, time, timedelta from dateutil.tz import tzlocal from httplib2 import Http from lxml import etree from urllib import urlencode from BeautifulSoup import BeautifulSoup channels = ['rtm1', 'rtm2', 'tv3', 'ntv7', '8tv', 'tv9'] datetime_format = '%Y%m%d%H%M%S %z' h = Http() h.force_exception_to_status_code = True #h.timeout = 15 logging.basicConfig( level=logging.DEBUG, format='%(asctime)s %(levelname)-8s %(process)d %(message)s', ) log = logging.getLogger(__name__) def strclean(s): s = s.strip().replace('‘', '\'').replace('’', '\'') if s != ' ': return s class Grabber(object): base_url = None def __init__(self, channel): self.channel = channel self.url = self.base_url def qs_params(self, date, **kwargs): '''Returns a dict of params to form the url's query string ''' raise NotImplementedError def _parse_html(self, date, html): '''Returns a list of dicts with the following keys: - mandatory: title, start - optional: stop, sub_title, desc, episode_number, episode_system ''' raise NotImplementedError def get_html(self, date, **kwargs): params = self.qs_params(date, **kwargs) response, content = h.request(self.url + '?' + urlencode(params)) if response.status == 200: return BeautifulSoup(content) else: log.error('Status: %s\nContent: %s' % (response.status, content)) def parse_html(self, date, html): prev_schedule = None try: for schedule in self._parse_html(date, html): if 'stop' in schedule: yield schedule elif prev_schedule: prev_schedule['stop'] = schedule['start'] yield prev_schedule prev_schedule = schedule except: log.exception('Cannot parse html for date %s' % date) def to_xml(self, schedules): for schedule in schedules: program = etree.Element('programme', channel=self.channel, start=schedule['start'].strftime(datetime_format), stop=schedule['stop'].strftime(datetime_format)) title = etree.SubElement(program, 'title') title.text = schedule['title'] if schedule.get('episode_num'): episode_num = etree.SubElement(program, 'episode-num') episode_num.set('system', schedule.get('episode_system')) episode_num.text = schedule['episode_num'] for field in ['sub_title', 'desc']: if schedule.get(field): elem = etree.SubElement(program, field.replace('_', '-')) elem.text = schedule[field] yield program def grab(self, date, **kwargs): html = self.get_html(date, **kwargs) if html: return self.to_xml(self.parse_html(date, html)) class Astro(Grabber): base_url = 'http://www.astro.com.my/channels/%(channel)s/Default.asp' params_dicts = [dict(batch=1), dict(batch=2)] ignores = ['No Transmission', 'Transmission Ends'] def __init__(self, channel): self.channel = channel self.url = self.base_url % dict(channel=channel) def qs_params(self, date, **kwargs): kwargs['sDate'] = date.strftime('%d-%b-%Y') return kwargs def _parse_html(self, date, html): header_row = html.find('tr', bgcolor='#29487F') for tr in header_row.fetchNextSiblings('tr'): tds = tr.findChildren('td') title = strclean(tds[1].find('a').string) if title in self.ignores: continue # start time, '21:00' -> 9 PM hour, minute = [int(x) for x in tds[0].string.split(':')] start = datetime.combine(date, time(hour, minute, tzinfo=tzlocal())) # duration, '00:30' -> 30 minutes hours, minutes = [int(x) for x in tds[2].string.split(':')] stop = start + timedelta(hours=hours, minutes=minutes) yield dict(title=title, start=start, stop=stop) class TheStar(Grabber): base_url = 'http://star-ecentral.com/tvnradio/tvguide/guide.asp' params_dicts = [dict(db='live')] def qs_params(self, date, **kwargs): kwargs['pdate'] = date.strftime('%m/%d/%Y') kwargs['chn'] = self.channel.replace('rtm', 'tv') return kwargs def _parse_html(self, date, html): last_ampm = None header_row = html.find('tr', bgcolor='#5e789c') for tr in header_row.fetchNextSiblings('tr'): tds = tr.findChildren('td') schedule = {} schedule['title'] = strclean(tds[1].find('b').find('font').string) schedule['desc'] = strclean(tds[2].find('font').string) episode_num = strclean(tds[3].find('font').string) if episode_num: try: episode_num = int(episode_num) - 1 episode_num = '.' + str(episode_num) + '.' episode_system = 'xmltv_ns' except ValueError: episode_system = 'onscreen' schedule['episode_num'] = episode_num schedule['episode_system'] = episode_system # start time, '9.00pm' -> 9 PM time_str = tds[0].find('font').string ampm = time_str[-2:] hour, minute = [int(x) for x in time_str[:-2].split('.')] if ampm == 'pm' and hour < 12: hour += 12 elif ampm =='am' and hour == 12: hour = 0 if last_ampm == 'pm' and ampm == 'am': date = date + timedelta(1) schedule['start'] = datetime.combine( date, time(hour, minute, tzinfo=tzlocal())) last_ampm = ampm yield schedule def main(): from optparse import OptionParser parser = OptionParser() parser.add_option('-s', '--source', dest='source', help='SOURCE to grab from: Astro, TheStar. Default: TheStar') parser.add_option('-d', '--date', dest='date', help='Start DATE to grab schedules for (YYYY-MM-DD). Default: today') parser.add_option('-n', '--days', dest='days', help='Number of DAYS to grab schedules for. Default: 1') parser.add_option('-f', '--file', dest='filename', metavar='FILE', help='Output FILE to write to. Default: stdout') options, args = parser.parse_args() if options.source is None: cls = TheStar else: cls = globals()[options.source] if options.date is None: date = dt.today() else: date = dt(*[int(x) for x in options.date.split('-')]) if options.days is None: days = 1 else: days = int(options.days) root = etree.Element('tv') for channel in channels: grabber = cls(channel) for i in range(days): for params_dict in cls.params_dicts: for elem in grabber.grab(date + timedelta(i), **params_dict): root.append(elem) xml = etree.tostring(root, encoding='UTF-8', xml_declaration=True, pretty_print=True) if options.filename is None: print xml else: open(options.filename, 'w').write(xml) if __name__ == '__main__': main() -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Tue Feb 23 10:46:33 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 23 Feb 2010 10:46:33 +0100 Subject: [XML-SIG] HTML parse error In-Reply-To: <437a31571002221945h1c0079d5i33d641b98f0fabfa@mail.gmail.com> References: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> <4B8298C3.5040701@behnel.de> <437a31571002221945h1c0079d5i33d641b98f0fabfa@mail.gmail.com> Message-ID: <4B83A3F9.3050202@behnel.de> sharifah ummu kulthum, 23.02.2010 04:45: > I am so sorry but I really don't know how to change the code as I have just > learn python. How am I going to switch the version or to change the code? > Because I don't really understand the code. > > Here is the code: > [...] That's some funny code - it uses BeautifulSoup to parse HTML, and then uses lxml to build an XML tree from it - instead of using just lxml in the first place... Please send an e-mail to the original author of the tool to tell him/her about the problem. Use the project mailing list for this (if there is one). If that doesn't help, I'd suggest installing BeautifulSoup 3.0.8 to see if that helps. Stefan From kulthum91 at gmail.com Tue Feb 23 11:28:39 2010 From: kulthum91 at gmail.com (sharifah ummu kulthum) Date: Tue, 23 Feb 2010 18:28:39 +0800 Subject: [XML-SIG] HTML parse error In-Reply-To: <4B83A3F9.3050202@behnel.de> References: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> <4B8298C3.5040701@behnel.de> <437a31571002221945h1c0079d5i33d641b98f0fabfa@mail.gmail.com> <4B83A3F9.3050202@behnel.de> Message-ID: <437a31571002230228t6058e426j2f5aa4eaac193a9@mail.gmail.com> On Tue, Feb 23, 2010 at 5:46 PM, Stefan Behnel wrote: > sharifah ummu kulthum, 23.02.2010 04:45: > > I am so sorry but I really don't know how to change the code as I have > just > > learn python. How am I going to switch the version or to change the code? > > Because I don't really understand the code. > > > > Here is the code: > > [...] > > That's some funny code - it uses BeautifulSoup to parse HTML, and then uses > lxml to build an XML tree from it - instead of using just lxml in the first > place... > > Please send an e-mail to the original author of the tool to tell him/her > about the problem. Use the project mailing list for this (if there is one). > If that doesn't help, I'd suggest installing BeautifulSoup 3.0.8 to see if > that helps. > > Stefan > > I have sent an email to the author and I doubt that it will be a quick respond. And this project does not have a mailing list. This is just an individual class project that I have to complete which the deadline is so close now. How can I install BeautifulSoup 3.0.8? # sudo easy_install BeautifulSoup 3.0.8 like this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Tue Feb 23 11:51:29 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 23 Feb 2010 11:51:29 +0100 Subject: [XML-SIG] HTML parse error In-Reply-To: <437a31571002230228t6058e426j2f5aa4eaac193a9@mail.gmail.com> References: <437a31571002220524g5a51facfibbdbe8ab64530c0@mail.gmail.com> <4B8298C3.5040701@behnel.de> <437a31571002221945h1c0079d5i33d641b98f0fabfa@mail.gmail.com> <4B83A3F9.3050202@behnel.de> <437a31571002230228t6058e426j2f5aa4eaac193a9@mail.gmail.com> Message-ID: <4B83B331.5020703@behnel.de> sharifah ummu kulthum, 23.02.2010 11:28: > On Tue, Feb 23, 2010 at 5:46 PM, Stefan Behnel wrote: >> sharifah ummu kulthum, 23.02.2010 04:45: >>> I am so sorry but I really don't know how to change the code as I have >> just >>> learn python. How am I going to switch the version or to change the code? >>> Because I don't really understand the code. >>> >>> Here is the code: >>> [...] >> That's some funny code - it uses BeautifulSoup to parse HTML, and then uses >> lxml to build an XML tree from it - instead of using just lxml in the first >> place... >> >> Please send an e-mail to the original author of the tool to tell him/her >> about the problem. Use the project mailing list for this (if there is one). >> If that doesn't help, I'd suggest installing BeautifulSoup 3.0.8 to see if >> that helps. >> >> I have sent an email to the author and I doubt that it will be a quick > respond. And this project does not have a mailing list. This is just an > individual class project that I have to complete which the deadline is so > close now. How can I install BeautifulSoup 3.0.8? > > # sudo easy_install BeautifulSoup 3.0.8 > > like this? You should consider reading the documentation of easy_install. That would have told you that you can use # sudo easy_install BeautifulSoup==3.0.8 Note that this (and most of the previous thread) is rather off-topic to this list. The comp.lang.python newsgroup would have been a better choice. Stefan