etree, gzip, and BytesIO

Frank Millman frank at chagford.com
Thu Jan 21 01:22:08 EST 2021


Hi all

This question is mostly to satisfy my curiosity.

In my app I use xml to represent certain objects, such as form 
definitions and process definitions.

They are stored in a database. I use etree.tostring() when storing them 
and etree.fromstring() when reading them back. They can be quite large, 
so I use gzip to compress them before storing them as a blob.

The sequence of events when reading them back is -
    - select gzip'd data from database
    - run gzip.decompress() to convert to a string
    - run etree.fromstring() to convert to an etree object

I was wondering if I could avoid having the unzipped string in memory, 
and create the etree object directly from the gzip'd data. I came up 
with this -

    - select gzip'd data from database
    - create a BytesIO object - fd = io.BytesIO(data)
    - use gzip to open the object - gf = gzip.open(fd)
    - run etree.parse(gf) to convert to an etree object

It works.

But I don't know what goes on under the hood, so I don't know if this 
achieves anything. If any of the steps involves decompressing the data 
and storing the entire string in memory, I may as well stick to my 
present approach.

Any thoughts?

Frank Millman



More information about the Python-list mailing list