[IronPython] Portable use of pickle.dumps()
Michael Foord
fuzzyman at voidspace.org.uk
Fri May 29 17:10:43 CEST 2009
Robert Smallshire wrote:
> Hi Michael,
>
>
>>> I'm trying to get some commercial code for a simple object
>>>
>> database we
>>
>>> have written for Python 2.6 to work with IronPython 2.6. In
>>>
>> Python 2.6
>>
>>> the return type of pickle.dumps() is str, which is of course a byte
>>> string. In IronPython 2.6 it is also str, which is of course a
>>> unicode string. This 'compatibility' is fine until I put those
>>> strings into a database, at which point my interoperability between
>>> CPython and IronPython goes off the rails.
>>>
>>>
>> How is this actually a problem?
>>
>> I mean, can you provide a specific example of where a string in
>> IronPython doesn't behave as a byte string in CPython. I'm sure there
>> are such examples, but those may be bugs that the IPy team
>> can fix. In
>> practise I've encountered these problems very rarely.
>>
>
> My opening paragraph may be ambiguously worded - by 'interoperability' I
> didn't mean the ability to run the same code unchanged on CPython and
> IronPython (I have to change the code anyway to use a different database
> adapter) - I meant interoperability between pickles persisted into a
> database from both IronPython and CPython.
>
So are you telling the database that it is binary data or text?
Is the question how do I go from a pickle string in IronPython to a byte
array that I can pass to the database adaptor without going through an
explicit encode (which will transform the data)?
(One technique would be to explicitly use pickle protocol 0 which is
less efficient but only creates ascii characters - this is actually the
default. Another alternative would be to use JSON or YAML instead of
pickle.)
Here is an example of getting a byte array from a binary pickle in
IronPython:
>>> import pickle
>>> class A(object):
... b = 'hello'
... c = (None, 'fish', 7.2, 7j)
... a = {1: 2}
...
>>> p = pickle.dumps(A(), protocol=2)
>>> p
u'\x80\x02c__main__\nA\nq\x00)\x81q\x01}q\x02b.'
>>> from System import Array, Byte
>>> a = Array[Byte](tuple(Byte(ord(c)) for c in p))
>>> a
Array[Byte]((<System.Byte object at 0x0000000000000033 [128]>,
<System.Byte obje...
I hope this is at least slightly helpful. :-)
Michael
> My basic issue is that the 'str' unavoidably implies certain semantics when
> calling .NET APIs from IronPython. These APIs interpret str as text rather
> than just bytes, which therefore gets transformed by various text encodings,
> such as UTF-8 to UTF-16. Such encodings are undesirable for my pickled data
> since the result is no longer necessarily a valid pickle. I suppose the
> intention in Python 3.0 is that 'bytes' doesn't carry any semantics with it,
> its just data, which is why pickle.dumps() in Python 3.0 returns bytes
> rather than str.
>
> I want to push plain old byte arrays into the database from both CPython and
> IronPython, so I can avoid any head-scratching confusion with database
> adapters and/or databases inappropriately encoding or decoding my data.
>
>
>> For example "data = [ord(c) for c in some_string]" has behaved as
>> expected many times for me in IronPython (and could help you turn
>> strings into bytes).
>>
>
> Thanks. I'll try something based on that.
>
>
>> Is this a theoretical problem at this stage or an actual problem?
>>
>
> Its an actual problem with SQLiteParameter.Value from the SQLite ADO.NET
> provider. I think our original CPython code is a bit sloppy with respect to
> the distinction between text strings and byte arrays, so I'll probably need
> to tighten things up on both sides.
>
> Would you agree tha using unicode() and bytes() everywhere and avoiding
> str() gives code that has the same meaning in Python 2.6, IronPython 2.6 and
> Python 3.0? Do you think this would be a good guideline to follow until we
> can leave Python 2.x behind?
>
> Many thanks,
>
> Rob
>
>
>
>
--
http://www.ironpythoninaction.com/
More information about the Ironpython-users
mailing list