From amitesh116 at gmail.com Thu Aug 9 08:11:08 2007 From: amitesh116 at gmail.com (amitesh kumar) Date: Thu, 9 Aug 2007 11:41:08 +0530 Subject: [Expat-discuss] Parse XML file Message-ID: Hi All, Please help me in the below code: import sys import xml.parsers.expat import dircache rec = {} flag = '*' recList = {} cnt=0 def start_element(name, attrs): sys.stdout.write(name+': ') if 'orrfnbr'==name: flag = 'ORRFNBR' # sys.stdout.write(flag) sys.stdout.flush() def end_element(name): sys.stdout.write('\n') sys.stdout.flush() def char_data(data): sys.stdout.write(str(repr(data))[1:]) if flag == 'ORRFNBR': rec['ORRFNBR']=str(repr(data))[1:] sys.stdout.write(str(rec)) # recList['1']=rec # cnt = cnt+1 # sys.stdout.write(str(recList[0]['X'])) sys.stdout.flush() files = dircache.listdir('./xmls/') for f in iter(files): print f g=open('./xmls/'+f, 'r') p = xml.parsers.expat.ParserCreate() p.StartElementHandler = start_element p.EndElementHandler = end_element p.CharacterDataHandler = char_data p.ParseFile(g) print recList g.close() ------ Here I'm trying to : 1. Read each XML file in a folder. 2. Parse file. 3. Store some of the tags values as key-value pair in a map 4. Similarly maintain another collection that'll store one map per file. Please someone help me. -- With Regards Amitesh K. 9850638640 From fdrake at acm.org Thu Aug 9 15:54:56 2007 From: fdrake at acm.org (Fred Drake) Date: Thu, 9 Aug 2007 09:54:56 -0400 Subject: [Expat-discuss] Parse XML file In-Reply-To: References: Message-ID: <78906083-FCD4-4A46-9E1C-4AFCAEBD6B02@acm.org> On Aug 9, 2007, at 2:11 AM, amitesh kumar wrote: > Please help me in the below code: I suspect most of the people on this list won't be familiar with the Python API to Expat; this sort of question usually goes to comp.lang.python or the Python XML-SIG mailing list (http:// mail.python.org/mailman/listinfo/xml-sig/). That doesn't make your question unwelcome, of course! > Here I'm trying to : > 1. Read each XML file in a folder. > 2. Parse file. > 3. Store some of the tags values as key-value pair in a map > 4. Similarly maintain another collection that'll store one map per > file. It's not clear without reading your code what you're having trouble with. Perhaps you should be more specific? I suspect, from the number of print statements in your code, that you're stabbing in the dark, and from question 4, that you're new to Python as well. -Fred -- Fred Drake From filu at mweb.co.za Sun Aug 5 16:45:41 2007 From: filu at mweb.co.za (Morris John) Date: Sun, 5 Aug 2007 10:45:41 -0400 Subject: [Expat-discuss] Text Message-ID: <001c01c7d76f$4bc5e2b0$1d548f7a@ejnzn> A non-text attachment was scrubbed... Name: Text.pdf Type: application/pdf Size: 6182 bytes Desc: not available Url : http://mail.libexpat.org/pipermail/expat-discuss/attachments/20070805/597b2b93/attachment.pdf From rsennat at gmail.com Tue Aug 14 09:19:11 2007 From: rsennat at gmail.com (Senthil Nathan) Date: Tue, 14 Aug 2007 12:49:11 +0530 Subject: [Expat-discuss] need info on leaf nodes, attributes & overlay xml Message-ID: <587d37680708140019x2ce2138ep8423126e78162d91@mail.gmail.com> Hi, I'm a newbie to use expat parser and ofcourse new to XML. I need some general clarifications. Say, I have a XML file as, value Hostname identifies the NE long descr of Hostname string set,delete 1. Here, Is it possible to have description, datatype, operations as attributes in the DOM tree generated from the parser, instead of as sub nodes of "hostName". 2. Is there anyway to overlay XML on a tree. I know that expat parser doesnt support DTD. But is there anyother options. Please post your comments on this. Would be helpful for me to proceed further. Thanks Senthil From rsennat at gmail.com Tue Aug 14 15:13:27 2007 From: rsennat at gmail.com (Senthil Nathan) Date: Tue, 14 Aug 2007 18:43:27 +0530 Subject: [Expat-discuss] expat & simple c expat wrapper Message-ID: <587d37680708140613g37cd95a6v94d57ef7f789906c@mail.gmail.com> Hi, I would like to know the difference between expat and scew. Scew says it creates a DOM tree which expat doesn't. or even whats between stream oriented parser (eg. expat) and the DOM parser (eg. scew) Can anyone please explain more on this front. Im also looking at their websites, but I guess I lack something here. So please explain. Thanks Senthil From nickmacd at gmail.com Tue Aug 14 23:04:49 2007 From: nickmacd at gmail.com (Nick MacDonald) Date: Tue, 14 Aug 2007 17:04:49 -0400 Subject: [Expat-discuss] expat & simple c expat wrapper In-Reply-To: <587d37680708140613g37cd95a6v94d57ef7f789906c@mail.gmail.com> References: <587d37680708140613g37cd95a6v94d57ef7f789906c@mail.gmail.com> Message-ID: In a nutshell, there are two general types of parsers, DOM and SAX. Expat is a SAX parser. There is a lot that could be said on this topic, but perhaps look at the eXpat mailing list archives... I am very sure the difference between SAX and DOM has been covered there many times (a few of them by myself.) SAX is usually more memory efficient and faster at parsing, but if you need to cross reference data in your document, you will need to figure a way to do this, as you SAX is fire and forget... once you have gone forward in the document there is no going backward without re-parsing the whole file over again. DOM builds a copy of the document in memory, which allows you to see different parts all at once, but this can be quite a memory hog for large documents, and for specially crafted ones (that have a lot of nesting.) For one such example, search for "million laughs", but here is one such example which is known to crash some systems: ]> More than two million laughs! &laugh30; As you can see, it is very short, but can be very deadly to a DOM parser. SAX should handle this with some grace, depending on how you implement your parsing. This should work fine for eXpat because it won't do the ENTITY expansion for you, so you wouldn't even see the problem unless you supported ENTITY's in your XML spec. Nick On 8/14/07, Senthil Nathan wrote: > I would like to know the difference between expat and scew. > > Scew says it creates a DOM tree which expat doesn't. > or even whats between stream oriented parser (eg. expat) and > the DOM parser (eg. scew) > > Can anyone please explain more on this front. > Im also looking at their websites, but I guess I lack something > here. So please explain. From zeus.84 at libero.it Sat Aug 18 10:18:20 2007 From: zeus.84 at libero.it (Giulio) Date: Sat, 18 Aug 2007 10:18:20 +0200 Subject: [Expat-discuss] Communicating between handlers Message-ID: Good morning, I need an example of Communicating between handlers, someone can send and help me? i must create a data structure? Thanks all for help From webmaster at hartwork.org Sun Aug 19 18:24:49 2007 From: webmaster at hartwork.org (Sebastian Pipping) Date: Sun, 19 Aug 2007 18:24:49 +0200 Subject: [Expat-discuss] Communicating between handlers In-Reply-To: References: Message-ID: <46C86ED1.80904@hartwork.org> Giulio wrote: > Good morning, I need an example of Communicating between handlers, someone can send and help me? i must create a data structure? Thanks all for help ------------------------------------------------------- Not sure what you mean. Much more details please! Sebastian From romez777 at gmail.com Mon Aug 20 04:28:36 2007 From: romez777 at gmail.com (Roman Mashak) Date: Sun, 19 Aug 2007 19:28:36 -0700 Subject: [Expat-discuss] parsing XML Message-ID: <40a670230708191928o538681d6v5812a76a8b42ef3a@mail.gmail.com> Hello, I need to parse out in C language the XML of the following structure: 666000000 -82 484000000 -80 So I took the 'expat' library to do that (I've never dealt with XML before though), and tried to cutomize the example they ship with library (outline.c). What I can't quite understand is: 1) is my XML really can be called XML, or it's some sort of invalid? According to wikipedia page on XML, the valid document should look like this: content while mine is a bit different 2) if anyway my xml document is correct, then how can I parse it with expat? What I need is upon occurences of FREQ and POWER tags to extract their values (i.e. 666000000 for FREQ or 082 for POWER in the above example). So, I think I need to register callback function for start tags and try to do what I want in there. But how can I get the values of tags, which 'expat' functions to use? Or there's another, more simple way? Thanks in advance. -- Roman From crazybob at crazybob.org Mon Aug 20 04:45:07 2007 From: crazybob at crazybob.org (Bob Lee) Date: Sun, 19 Aug 2007 19:45:07 -0700 Subject: [Expat-discuss] parsing XML In-Reply-To: <40a670230708191928o538681d6v5812a76a8b42ef3a@mail.gmail.com> References: <40a670230708191928o538681d6v5812a76a8b42ef3a@mail.gmail.com> Message-ID: On 8/19/07, Roman Mashak wrote: > > So, I think I need to register callback function for start tags and try to > do what I want in there. But how can I get the values of tags, which > 'expat' > functions to use? Or there's another, more simple way? SetCharacterDataHandler: http://www.xml.com/pub/a/1999/09/expat/index.html?page=3#chardatahandler Note that you may get more than one callback for the same element body in which case it's up to you to concatenate the results. Bob From ali at mental.com Mon Aug 20 10:58:27 2007 From: ali at mental.com (Albrecht Fritzsche) Date: Mon, 20 Aug 2007 10:58:27 +0200 Subject: [Expat-discuss] Communicating between handlers (Giulio) In-Reply-To: References: Message-ID: <46C957B3.8050803@mental.com> expat-discuss-request at libexpat.org wrote: > > > Today's Topics: > > 1. Communicating between handlers (Giulio) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 18 Aug 2007 10:18:20 +0200 > From: "Giulio" > Subject: [Expat-discuss] Communicating between handlers > To: "expat-discuss" > Message-ID: > Content-Type: text/plain; charset=iso-8859-1 > > Good morning, I need an example of Communicating between handlers, someone can send and help me? i must create a data structure? Thanks all for help All handlers do get the struct XML_ParserStruct passed to. With the function void XMLCALL XML_SetUserData(XML_Parser p, void *userData); you can add your own struct for communication to it. Inside the handlers you do retrieve it then via - surprise, surprise - void * XMLCALL XML_GetUserData(XML_Parser p); Hope that helps Ali From m-higa at jaist.ac.jp Wed Aug 22 21:48:49 2007 From: m-higa at jaist.ac.jp (=?ISO-2022-JP?B?GyRCRWw4NkA1Q1IbKEI=?=) Date: Thu, 23 Aug 2007 04:48:49 +0900 Subject: [Expat-discuss] About caracter handler and attribute value Message-ID: <46CC9321.8020409@jaist.ac.jp> Hi, I have two questions. 1. Can expat get attribute value ? 2. I write C program as folows. but why on the first line of first cahracter data it cut root open tag(). (program) .... void char_handle(void *userdata,const XML_char *s,int len) { printf("[CHARACTERDATA] %s\n",s) } int main(int argc,char *argv[]) { char buf[BUSFSIZE}; int eofflag; size_t len; XML_PARSER parser; if((parser=XML_ParserCreate(NULL))==NULL){ fprintf(stderr,"parser creation error\n"); exit(-1); } XML_SetCharacterDataHandler(parser,char_handle); do{ len=fread(buf,sizeof(char),BUFSIZE,stdin); if(ferror(stdin)){ fprintf(stderr,"file error\n"); exit(-1); } eofflag=feof(stdin); if((XML_Parser(parser,buf,(int)les,eofflag))==0){ fprintf(stderr,"parser error\n"); } }while(!eofflag); return 0; } (XML file) aaa bbb (output) [Characterdata] :<-why? aaa bbb [Characterdata] aaa bbb [Characterdata] bbb [Characterdata] [Characterdata] Please someone help me. Regards Masanori Higashihara From m-higa at jaist.ac.jp Thu Aug 23 22:01:30 2007 From: m-higa at jaist.ac.jp (=?iso-2022-jp?B?GyRCRWw4NhsoQg==?= =?iso-2022-jp?B?GyRCQDVDUhsoQg==?=) Date: Fri, 24 Aug 2007 05:01:30 +0900 Subject: [Expat-discuss] About character handler and attribute value In-Reply-To: <46CD60F1.4010300@jezuk.co.uk> References: <46CC9321.8020409@jaist.ac.jp> <46CD60F1.4010300@jezuk.co.uk> Message-ID: Thank you. I rewrite as follows.It worked. void char_handle(void *userdata,const XML_char *s,int len) { for(int i=0;i>Hi, I have two questions. >> 1. Can expat get attribute value ? >> 2. I write C program as folows. >> but why on the first line of first cahracter data it cut root open >> tag(). >> >> (program) >> .... >> void char_handle(void *userdata,const XML_char *s,int len) >> { >> printf("[CHARACTERDATA] %s\n",s) >> } >> > > Using printf here is the problem. const XML_char *s is not a null > terminated string, it's a pointer into an underlying buffer. int len > tells you how many characters there are. The text you want is > from s to > s+len. In the example you've given you could use a loop and > putchar to > output it. > > Jez >