Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

Tue Jan 3 22:57:02 EST 2023

I am trying to wrap my head around how one goes about working with and 
editing xml elements since it feels more complicated than it seems it 
should be.. Just to get some feedback on how others might approach it 
and see if I am missing anything obvious that I haven't discovered yet, 
since maybe I am wandering off in a wrong way of thinking..

I am looking to interact with elements directly, loaded from a template, 
editing them, then ultimately submitting them to an API as a modified 
xml document.

Consider the following:

from lxml import objectify, etree
schema = etree.XMLSchema(file="path_to_my_xsd_schema_file")
parser = objectify.makeparser(schema=schema, encoding="UTF-8")
xml_obj = objectify.parse("path_to_my_xml_file", parser=parser)
xml_root = xml_obj.getroot()

let's say I have a Version element, that is defined simply as a string 
in a 3rd party provided xsd schema

<xs:element name="Version" type="xs:string" minOccurs="0">

and is set to a number <Version>2342</Version> in my document

The xml file loads with the above code successfully against the schema

But lxml objectify decides the element type is Int, and the pytype is int..

Version <class 'lxml.objectify.IntElement'>
Version.pyval <class 'int'>

Let's say I want this loaded into a UI with a variety of dynamically 
loaded entry widgets so I can edit a large number of values like this 
and of many other different types.

I can assign in one of two ways (both resulting the same)
xml_root.Version =
xml_root['Version'] =
(if there is some other more kosher way of assignment, let me know)

I can assign "2342" and the element suddenly becomes a <class 
'lxml.objectify.StringElement'>

I can assign 1.4 and the element suddenly becomes a <class 
'lxml.objectify.FloatElement'>

The schema does not check during this assignment, it could be invalid, 
like assigning "abc" to a xs:dateTime and it does so any way.
The original value is lost. The only way I see to verify against the 
schema again is to do so explicitly against the whole root.

schema.validate(xml_root)

This returns False because of the added xmlns:py, py:pytype stuff, I can 
strip those with:
objectify.deannotate(xml_root[etree.QName(xml_root.Version.tag).localname], 
cleanup_namespaces=True)

and get back to schema.validate(xml_root) validating True. BUT, it 
validates True whether the element is a String, Int, Float, etc (so long 
as it 'could' potentially be a string or something..).. So let's say a 
Version is 322.1121000, should be a string, validates against the schema 
as string, but is now 322.1121 (much more relevant for something like a 
product identification number)

If it is a case where the validate remains False, I then have to 
manually look at the error log via schema.error_log for something like this:

api_files/Basic:0:0:ERROR:SCHEMASV:SCHEMAV_CVC_DATATYPE_VALID_1_2_1: 
Element '{nsstuff}StartTime': 'asdfasdfa' is not a valid value of the 
atomic type 'xs:dateTime'.

Then I have to consider how I should reject the users input.. From a UI 
design standpoint it just seems like a lot of added steps, and redundant 
work on top of a object layer that doesn't really do anything other than 
give me a thumbs up on the way in and a thumbs up on a way out. Rather 
than interacting with an object that can say your change is schema 
approved or not from the get-go, I instead seem to have to parse 100000+ 
lines of xsd and design UI interaction much more situationally and 
granularly to assert types and corner cases and preserve original values 
in duplicate structures, etc..

My original assumptions when hearing about xml features doesn't seem to 
exist from what I have found so far. Where schema should be the law, if 
my schema says something should be loaded as a string, it should be a 
string (or something close enough, definitely not an int or float), then 
attempting to assign something to it that doesn't match schema should be 
denied or throw an error. I am sure under the hood it would probably 
have performance draw backs or something.. Oh well.. Back to 
contemplating and tinkering..