[Tutor] Re Problems with creating XML-documents

Karjer Jdfjdf karper12345 at yahoo.com
Thu Apr 15 08:03:33 CEST 2010


>> I'm having problems with creating XML-documents, 
>> because I don't seem to write it to a document correctly. 

>Is that because you don't understand XML or because the 
>output is not what you expect? How is the data being generated? 
>Are you parsing an existing XML source or creating the XML 
>from scratch? I'm not sure I understand your problem.

I know the theory of XML but have never used it really and
I'm a bit unsecure about it.

Basically I'm doing the following:

1. retrieve data from a database ( instance in q )
2. pass the data to an external java-program that requires file-input 
3. the java-program modifies the inputfile and creates an outputfile based on the inputfile
4. I read the outputfile and try to parse it.

1 to 3 are performed by a seperate program that creates the XML
4 is a program that tries to parse it (and then perform other
modifications using python)

When I try to parse the outputfile it creates different errors such as:
   * ExpatError: not well-formed (invalid token):

Basically it ususally has something to do with not-well-formed XML. 
Unfortunately the Java-program also alters the content on essential 
points such as inserting spaces in tags (e.g. id="value" to id = " value " ),
which makes it even harder. The Java is really a b&%$#!, but I have
no alternatives because it is custommade (but very poorly imho).

Sorry, I've nog been clear, but it's very difficult and frustrating for 
me to troubleshoot this properly because the Java-program is quite huge and 
takes a long time to load before doing it's actions and when running
also requires a lot of time. The minimum is about 10 minutes per run.

This means trying a few little things takes hours.
Because of the long load and processing time of the Java-program I'm forced
to store the output in a single file instead of processing it record by record.


Also each time I have to change something I have to modify functions in 
different libraries that perform specific functions. This probably means
that I've not done it the right way in the first place.



>>      text = str('<record id="' + str(instance.id)+ '">\n' + \
' <date>' + str(instance.datetime) + ' </date>\n' + \
' <order>' + instance.order + ' </order>\n' + \
'</record>\n')

>You can simplify this quite a lot. You almost certaionly don;t need 
>the outer str() and you probably don;t need the \ characters either.

I use a very simplified text-variable here. In reality I also include 
other fields which contain numeric values as well. I use the \ to
keep each XML-tag on a seperate line to keep the overview.


>Also it might be easier to use a triple quoted string and format 
>characters to insert the dasta values.

>> When I try to parse it, it keeps giving errors. 

>Why do you need to parse it if you are creating it?
>Or is this after you read it back later? I don't understand the 
>sequence of processing here.

>> So I tried to use an external library jaxml, 

>Did you try to use the standard library tools that come with Python, 
>like elementTree or even sax?

I've been trying to do this with minidom, but I'm not sure if this 
is the right solution because I'm pretty unaware of XML-writing/parsing

At the moment I'm tempted to do a line-by-line parse and trigger on
an identifier-string that identifies the end and start of a record. 
But that way I'll never learn XML.


>I think we need a few more pointers to the root cause here.




      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100414/2c3d4c2a/attachment.html>


More information about the Tutor mailing list