Python - working with xml/lxml/objectify/schemas, datatypes, and assignments

aapost aapost at idontexist.club
Tue Jan 10 22:15:07 EST 2023


On 1/4/23 12:13, aapost wrote:
> On 1/4/23 09:42, Dieter Maurer wrote:
>> aapost wrote at 2023-1-3 22:57 -0500:
>>> ...
>>> Consider the following:
>>>
>>> from lxml import objectify, etree
>>> schema = etree.XMLSchema(file="path_to_my_xsd_schema_file")
>>> parser = objectify.makeparser(schema=schema, encoding="UTF-8")
>>> xml_obj = objectify.parse("path_to_my_xml_file", parser=parser)
>>> xml_root = xml_obj.getroot()
>>>
>>> let's say I have a Version element, that is defined simply as a string
>>> in a 3rd party provided xsd schema
>>>
>>> <xs:element name="Version" type="xs:string" minOccurs="0">
>>
>> Does your schema include the third party schema?
>>
>> You might have a look at `PyXB`, too.
>> It tries hard to enforce schema restrictions in Python code.
> 
> 
> Yes, to clarify, they provide the schema, which is what we use, 
> downloaded locally. Basically just trying to remain compliant with their 
> structures that they already define without reinventing the wheel for 
> numerous calls and custom types, and in a way that feels more live 
> rather than just checking validity at the end of the edits as if I were 
> modifying the XML manually.
> 
> Thank you for the suggestion, PyXB works much more like how I envisioned 
> working with xml in my head:
> 
>  >>> xml_root.Version = 1231.32000
> pyxb.exceptions_.SimpleTypeValueError: Type 
> {http://www.w3.org/2001/XMLSchema}string cannot be created from: 1231.32
>  >>> xml_root.Version = "1231.32000"
> 
> I will have to do some more testing to see how smooth the transition 
> back to a formatted document goes, since it creates a variable for all 
> possible fields defined in the type, even if they are optional and not 
> there in the situational template.
> 
> Thanks

Unfortunately picking it apart for a while and diving deeper in to a 
rabbit hole, PyXB looks to be a no-go.

PyXB while interesting, and I respect it's complexity and depth, is 
lacking in design consistency in how it operates if you are trying to 
modify and work with the resulting structure intuitively. It was 
developed on Python2 14 years ago, made compatible with python3 late, 
seems like it was trying to maintain vast version compatibility rather 
than getting a needed overhaul, before being abandoned in 2017 after the 
author moved on to more interesting work... I don't blame him, lol.. The 
community forks are just minor bug fixes currently.

There are no setValue()/_setValue() functions for SimpleTypes (the bulk 
of your objects) so you can't change their values directly. Assigning to 
them appears to work if they are nested inside a parent that has 
__setattr__ overloaded (as a default resulting structure does when you 
first load a document), but it is a rats nest as far as what happens 
from there. Sometimes it calls .Factory(), sometimes it goes through a 
series of __init__s, but nothing is really clear on what is or is not a 
kosher approach to managing value changes, and my attempts have failed 
so far to see if I could figure out how to encompass those paths in to a 
single _setValue() call.

Then there are ComplexTypes, with a value called _IsSimpleContent, which 
indicates whether it is a wrapper for a custom SimpleType, or something 
that does not contain SimpleType data. These DO have _setValue() 
functions IF it contains SimpleType data, where the SimpleType is stored 
in a __content member variable. Assignment on these also appears to work 
but the results aren't good, you need to use _setValue(), or you lose 
things like attributes. It would have been nicer if the structure of the 
ComplexType was called something else and wrapped all objects with a 
common set of functions.

The validate functions do not work how one would assume, like they do 
for other libraries, where they go back and verify the data. I believe 
they only function on the way in, because if the data becomes invalid 
through some manual messing with it after the fact, they indicate that 
the data is still valid even when it's not.

It seems like what I am probably looking for may reside in java with 
JAXB, but nothing really beyond that.

generateDS, doesn't really offer anything I need from what I could tell 
in messing with it, and by the looks of it, I might have to bite the 
bullet and use the xmlschema library, work with dicts, handle many more 
corner cases on my side, and just let the result be a lot clunkier than 
I was hoping. Unless I find 8 years to redesign the wheel myself, not 
sure I am granted that ability though. Oh well. lol


More information about the Python-list mailing list