Strange occasional marshal error

Graham Stratton grahamstratton at gmail.com
Thu Mar 3 10:09:03 EST 2011


On Mar 2, 3:01 pm, Graham Stratton <grahamstrat... at gmail.com> wrote:
> We are using marshal for serialising objects before distributing them
> around the cluster, and extremely occasionally a corrupted marshal is
> produced. The current workaround is to serialise everything twice and
> check that the serialisations are the same. On the rare occasions that
> they are not, I have dumped the files for comparison. It turns out
> that there are a few positions within the serialisation where
> corruption tends to occur (these positions seem to be independent of
> the data of the size of the complete serialisation). These are:
>
> 4 bytes starting at 548867 (0x86003)
> 4 bytes starting at 4398083 (0x431c03)
> 4 bytes starting at 17595395 (0x10c7c03)
> 4 bytes starting at 19794819 (0x12e0b83)
> 4 bytes starting at 22269171 (0x153ccf3)
> 2 bytes starting at 25052819 (0x17e4693)
> 3 bytes starting at 28184419 (0x1ae0f63)

I modified marshal.c to print when it extends the string used to write
the marshal to. This gave me these results:

>>> s = marshal.dumps(list((i, str(i)) for i in range(1400000)))
Resizing string from 50 to 1124 bytes
Resizing string from 1124 to 3272 bytes
Resizing string from 3272 to 7568 bytes
Resizing string from 7568 to 16160 bytes
Resizing string from 16160 to 33344 bytes
Resizing string from 33344 to 67712 bytes
Resizing string from 67712 to 136448 bytes
Resizing string from 136448 to 273920 bytes
Resizing string from 273920 to 548864 bytes
Resizing string from 548864 to 1098752 bytes
Resizing string from 1098752 to 2198528 bytes
Resizing string from 2198528 to 4398080 bytes
Resizing string from 4398080 to 8797184 bytes
Resizing string from 8797184 to 17595392 bytes
Resizing string from 17595392 to 19794816 bytes
Resizing string from 19794816 to 22269168 bytes
Resizing string from 22269168 to 25052814 bytes
Resizing string from 25052814 to 28184415 bytes
Resizing string from 28184415 to 31707466 bytes

Every corruption point occurs exactly three bytes above an extension
point (rounded to the nearest word for the last two). This clearly
isn't a coincidence, but I can't see where there could be a problem.
I'd be grateful for any pointers.

Thanks,

Graham



More information about the Python-list mailing list