Alternatives to XML?

Joonas Liik liik.joonas at gmail.com
Fri Aug 26 11:00:30 EDT 2016


On 26 August 2016 at 17:58, Joonas Liik <liik.joonas at gmail.com> wrote:
> On 26 August 2016 at 16:10, Frank Millman <frank at chagford.com> wrote:
>> "Joonas Liik"  wrote in message
>> news:CAB1GNpQnJDENaA-GZgt0TbcvWjaKNgD3YRoiXgyY+Mim7fw0zQ at mail.gmail.com...
>>
>>> On 26 August 2016 at 08:22, Frank Millman <frank at chagford.com> wrote:
>>> >
>>> > So this is my conversion routine -
>>> >
>>> > lines = string.split('"')  # split on attributes
>>> > for pos, line in enumerate(lines):
>>> >    if pos%2:  # every 2nd line is an attribute
>>> >        lines[pos] = line.replace('<', '<').replace('>', '>')
>>> > return '"'.join(lines)
>>> >
>>>
>>> or.. you could just escape all & as & before escaping the > and <,
>>> and do the reverse on decode
>>>
>>
>> Thanks, Joonas, but I have not quite grasped that.
>>
>> Would you mind explaining how it would work?
>>
>> Just to confirm that we are talking about the same thing -
>>
>> This is not allowed - '<root><fld name="<new>"/></root>'  [A]
>>
>>>>> import xml.etree.ElementTree as etree
>>>>> x = '<root><fld name="<new>"/></root>'
>>>>> y = etree.fromstring(x)
>>
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>>  File
>> "C:\Users\User\AppData\Local\Programs\Python\Python35\lib\xml\etree\ElementTree.py",
>> line 1320, in XML
>>    parser.feed(text)
>> xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1,
>> column 17
>>
>> You have to escape it like this - '<root><fld name="<new>"/></root>'
>> [B]
>>
>>>>> x = '<root><fld name="<new>"/></root>'
>>>>> y = etree.fromstring(x)
>>>>> y.find('fld').get('name')
>>
>> '<new>'
>>>>>
>>>>>
>>
>> I want to convert the string from [B] to [A] for editing, and then back to
>> [B] before saving.
>>
>> Thanks
>>
>> Frank
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>
> something like.. (untested)
>
> def escape(untrusted_string):
>     ''' Use on the user provided strings to render them inert for storage
>       escaping & ensures that the user cant type sth like '>' in
> source and have it magically decode as '>'
>     '''
>     return untrusted_string.replace("&","&").replace("<",
> "<").replace(">", ">")
>
> def unescape(escaped_string):
>     '''Once the user string is retreived from storage use this
> function to restore it to its original form'''
>     return escaped_string.replace("<","<").replace(">",
> ">").replace("&", "&")
>
> i should note tho that this example is very ad-hoc, i'm no xml expert
> just know a bit about xml entities.
> if you decide to go this route there are probably some much better
> tested functions out there to escape text for storage in xml
> documents.

you might want to un-wrap that before testing tho.. no idea why my
messages get mutilated like that :(
(sent using gmail, maybe somebody can comment on that?)



More information about the Python-list mailing list