recover pickled data: pickle data was truncated

Barry barry at barrys-emacs.org
Sat Jan 1 08:09:16 EST 2022



> On 31 Dec 2021, at 17:53, iMath <redstone-cold at 163.com> wrote:
> 
> 在 2021年12月30日星期四 UTC+8 03:13:21,<Marco Sulla> 写道:
>>> On Wed, 29 Dec 2021 at 18:33, iMath <redsto... at 163.com> wrote: 
>>> But I found the size of the file of the shelve data didn't change much, so I guess the data are still in it , I just wonder any way to recover my data.
>> I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling 
>> it by hand is a harsh work and maybe unreliable. 
>> 
>> Is there any reason you can't simply add a semaphore to avoid writing 
>> at the same time and re-run the code and regenerate the data?
> 
> Thanks for your replies! I didn't have a sense of adding a semaphore on writing to pickle data before, so  corrupted the data.
> Since my data was colleted in the daily usage, so cannot re-run the code and regenerate the data.
> In order to avoid corrupting my data again and the complicity of using  a semaphore, now I am using json text to store my data.

That will not fix the problem. You will end up with corrupt json.

If you have one writer and one read then may be you can use the fact that a rename is atomic.

Writer does this:
1. Creat new json file in the same folder but with a tmp name
2. Rename the file from its tmp name to the public name.

The read will just read the public name.

I am not sure what happens in your world if the writer runs a second time before the data is read.

In that case you need to create a queue of files to be read.

But if the problem is two process racing against each other you MUST use locking.
It cannot be avoided for robust operations.

Barry


> -- 
> https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list