Problem with psycopg2, bytea, and memoryview

Neil Cerutti neilc at norwich.edu
Wed Jul 31 10:08:44 EDT 2013


On 2013-07-31, Frank Millman <frank at chagford.com> wrote:
>
> "Antoine Pitrou" <solipsis at pitrou.net> wrote in message 
> news:loom.20130731T114936-455 at post.gmane.org...
>> Frank Millman <frank <at> chagford.com> writes:
>>>
>>> I have some binary data (a gzipped xml object) that I want to store in a
>>> database. For PostgreSQL I use a column with datatype 'bytea', which is
>>> their recommended way of storing binary strings.
>>>
>>> I use psycopg2 to access the database. It returns binary data
>>> in the form of a python 'memoryview'.
>>>
>> [...]
>>>
>>> Using MS SQL Server and pyodbc, it returns a byte string, not
>>> a memoryview, and it does compare equal with the original.
>>>
>>> I can hack my program to use tobytes(), but it would add
>>> complication, and it would be database-specific. I would
>>> prefer a cleaner solution.
>>
>> Just cast the result to bytes (`bytes(row[1])`). It will work
>> both with bytes and memoryview objcts.
>
> Thanks for that, Antoine. It is an improvement over tobytes(),
> but i am afraid it is still not ideal for my purposes.
>
> At present, I loop over a range of columns, comparing 'before'
> and 'after' values, without worrying about their types. Strings
> are returned as str, integers are returned as int, etc. Now I
> will have to check the type of each column before deciding
> whether to cast to 'bytes'.
>
> Can anyone explain *why* the results do not compare equal? If I
> understood the problem, I might be able to find a workaround.

A memoryview will compare equal to another object that supports
the buffer protocol when the format and shape are also equal. The
database must be returning chunks of binary data in a different
shape or format than you are writing it.

Perhaps psycopg2 is returning a chunk of ints when you have
written a chunk of bytes. Check the .format and .shape members of
the return value to see.

>>> x = memoryview(b"12345")
>>> x.format
'B'
>>> x.shape
(5,)
>>> x == b"12345"
True

My guess is you're getting format "I" from psycopg2. Hopefully
there's a way to coerce your desired "B" format interpretation of
the raw data using psycopg2's API.

-- 
Neil Cerutti



More information about the Python-list mailing list