Alternatives to XML?

Joonas Liik liik.joonas at gmail.com
Fri Aug 26 10:58:46 EDT 2016


On 26 August 2016 at 16:10, Frank Millman <frank at chagford.com> wrote:
> "Joonas Liik"  wrote in message
> news:CAB1GNpQnJDENaA-GZgt0TbcvWjaKNgD3YRoiXgyY+Mim7fw0zQ at mail.gmail.com...
>
>> On 26 August 2016 at 08:22, Frank Millman <frank at chagford.com> wrote:
>> >
>> > So this is my conversion routine -
>> >
>> > lines = string.split('"')  # split on attributes
>> > for pos, line in enumerate(lines):
>> >    if pos%2:  # every 2nd line is an attribute
>> >        lines[pos] = line.replace('<', '<').replace('>', '>')
>> > return '"'.join(lines)
>> >
>>
>> or.. you could just escape all & as & before escaping the > and <,
>> and do the reverse on decode
>>
>
> Thanks, Joonas, but I have not quite grasped that.
>
> Would you mind explaining how it would work?
>
> Just to confirm that we are talking about the same thing -
>
> This is not allowed - '<root><fld name="<new>"/></root>'  [A]
>
>>>> import xml.etree.ElementTree as etree
>>>> x = '<root><fld name="<new>"/></root>'
>>>> y = etree.fromstring(x)
>
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File
> "C:\Users\User\AppData\Local\Programs\Python\Python35\lib\xml\etree\ElementTree.py",
> line 1320, in XML
>    parser.feed(text)
> xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1,
> column 17
>
> You have to escape it like this - '<root><fld name="<new>"/></root>'
> [B]
>
>>>> x = '<root><fld name="<new>"/></root>'
>>>> y = etree.fromstring(x)
>>>> y.find('fld').get('name')
>
> '<new>'
>>>>
>>>>
>
> I want to convert the string from [B] to [A] for editing, and then back to
> [B] before saving.
>
> Thanks
>
> Frank
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list

something like.. (untested)

def escape(untrusted_string):
    ''' Use on the user provided strings to render them inert for storage
      escaping & ensures that the user cant type sth like '>' in
source and have it magically decode as '>'
    '''
    return untrusted_string.replace("&","&").replace("<",
"<").replace(">", ">")

def unescape(escaped_string):
    '''Once the user string is retreived from storage use this
function to restore it to its original form'''
    return escaped_string.replace("<","<").replace(">",
">").replace("&", "&")

i should note tho that this example is very ad-hoc, i'm no xml expert
just know a bit about xml entities.
if you decide to go this route there are probably some much better
tested functions out there to escape text for storage in xml
documents.



More information about the Python-list mailing list