XML to SQL or XML into Tables SomeHow

Frank V. Castellucci frankc at colconsulting.com
Thu May 25 08:22:09 EDT 2000


I just happen to have done this, although for a different language,
none-the-less:

Overview:
=========
We decided on the Mediator pattern with components that had specific
events going out, with interest in other events coming in. This gives
the flexibility to exchange component implementations based on target
RDBMS.

The mediator creates the classes (driven by user options, detection of
DB, etc.) and collects from each what events they generate and what
events they are interested in. After this pass, the Mediator would cross
link the events to the components for listening.

Setup and assumptions:
----------------------

The DTD entity describes a table with attribute lists that name the
columns.

<db = "test">
	<table name="foo",sex="yesplease">
		<table_name>person</table_name>
	</table>
</db>

This was choosen because (prior to finding (and still looking) for XML
schema capable compilers), the attribute lists was deemed to have
flexible constraint properties.

The database driver component is capable of responding to reasoning
request (such as which columns are
nullable, which are primary, etc., etc.)

Parse:
------
The use of a SAX capable parser (scalability) was choosen, our parser
transposed the entity parse event into a Record object with a Name
(table), and a collection of pairs (ColumnName, Value).

Listens for termination events
Generates parse and parsecomplete events

Validate:
---------
This phase does whatever validation you want it to do (column count <=
MetaColumnInfo.count), etc.

Listens for parse, parsecomplete, terminate
Generates validationpass, validationfail, validationcomplete

Format:
-------
Different underlying DBs have little quirks (Date format, Timestamp
format, etc.) which the formater would then ask the connection driver to
format, otherwise the format generates the SQL statement based on user
input or heuristics about content.
So, assuming a load operation was not specificed, if most of the content
was in the validate record for the row, we would mark the record as
InsertProbable likely, and UpdateProbable if it was the primary key and
little else [See note 1]. This was really just a suggestion to the
writer.

Listens for validationpass, validationcomplete, terminate
Generates formatted, formatcomplete, formatfailed

Write:
------
If the user selects a load operation, then the Writer would invoke (if
available) a native load utility after generating the proper load files.
If the user selects an import operation, then the writer would use the
statement signature as a key to optimize direct SQL calls to the
connection. In this mode the writer would take the suggestion from the
formatter on whether it the entry was a Update vs. Insert and attempt
the operation, if an update was attempted and error returned we would
change state of the record to insert and re-queue it for execution, for
the corollary (insert) we would do the same. Of course you want to do
only two tries per record before calling it quits <grin>.

Listens for formatted, formatcomplete, terminate
Generates writefailed, writecompleted

Notes
=====
1. If there is no primary key on the table, you can't determine update
versus insert without a tremendous (non-scalable) drag on the system.

2. The driver per database type should make every effort in it's
implementation to satisfy the reasoning queries.

3. True multi-threading is useful to keep the work flowing, but in
Python (as I understand it) you don't get it during interpretation
phases.

Does this make sense to you?

Thomas Weholt wrote:
> 
> Hi,
> 
> I want to submit a special formattet xml, nothing more than xml based
> on a specified DTD, to a module/script that "transforms" the xml data
> to a valid SQL insert-statement.
> 
> Ex.
> 
> <xml-data>
>         <database name="test">
>                 <table name ="person">
>                         <row>
>                             <field field_name="name" value="thomas">
>                             <field field_name="sex" value="male">
>                         </row>
>                         <row>
>                             <field field_name="name" value="bill">
>                             <field field_name="sex" value="male">
>                         </row>
>                         <row>
>                             <field field_name="name" value="jane">
>                             <field field_name="sex" value="female">
>                         </row>
>                 </table>
>         </database>
> </xml-data>
> 
> I want an output like :
> 
> # SQL-statements
> 
> use test;
> insert into person
> (name, sex)
> values
> ("thomas", "male");
> 
> insert into person
> (name, sex)
> values
> ("bill", "male");
> 
> insert into person
> (name, sex)
> values
> ("jane", "female");
> 
> ##################
> 
> Perhaps the entries must have a specified type like
> <field field_name="name" value="thomas" field_type="string"> etc.
> 
> For simple insertion statements this shouldn`t that hard to do, but
> I`m having a hard time locating the stuff to do it.
> 
> I`ve looked at RAX, SAX, Pyxie etc. If somebody could give me a hint
> on how to "transform" the example above I`d be happy to try and make a
> more generic module out of it. I just need some help to get started.
> 
> Thomas

-- 
Frank V. Castellucci
http://corelinux.sourceforge.net
OOA/OOD/C++ Standards and Guidelines for Linux
http://PythPat.sourceforge.net
Pythons Pattern Package



More information about the Python-list mailing list