recover pickled data: pickle data was truncated

Barry Scott barry at barrys-emacs.org
Sun Jan 2 07:37:16 EST 2022



> On 1 Jan 2022, at 16:13, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
> 
> I agree with Barry. You can create a folder or a file with
> pseudo-random names. I recommend you to use str(uuid.uuid4())

At work and personally I use iso-8601 timestamps to make the files unique and easy to
find out when they where created.

:>>> t = datetime.datetime.now()
:>>> t
datetime.datetime(2022, 1, 2, 12, 34, 1, 267935)
:>>> t.strftime('%Y-%m-%dT%H-%M-%S')
'2022-01-02T12-34-01'
:>>>

That is good enough as long as you create the files slower than once a second.

Oh and yes use JSON, it is far better as a way of exchanging data than pickle.
Easy to read and check, can be processes in many languages.

Barry


> 
> On Sat, 1 Jan 2022 at 14:11, Barry <barry at barrys-emacs.org> wrote:
>> 
>> 
>> 
>>> On 31 Dec 2021, at 17:53, iMath <redstone-cold at 163.com> wrote:
>>> 
>>> 在 2021年12月30日星期四 UTC+8 03:13:21,<Marco Sulla> 写道:
>>>>> On Wed, 29 Dec 2021 at 18:33, iMath <redsto... at 163.com> wrote:
>>>>> But I found the size of the file of the shelve data didn't change much, so I guess the data are still in it , I just wonder any way to recover my data.
>>>> I agree with Barry, Chris and Avi. IMHO your data is lost. Unpickling
>>>> it by hand is a harsh work and maybe unreliable.
>>>> 
>>>> Is there any reason you can't simply add a semaphore to avoid writing
>>>> at the same time and re-run the code and regenerate the data?
>>> 
>>> Thanks for your replies! I didn't have a sense of adding a semaphore on writing to pickle data before, so  corrupted the data.
>>> Since my data was colleted in the daily usage, so cannot re-run the code and regenerate the data.
>>> In order to avoid corrupting my data again and the complicity of using  a semaphore, now I am using json text to store my data.
>> 
>> That will not fix the problem. You will end up with corrupt json.
>> 
>> If you have one writer and one read then may be you can use the fact that a rename is atomic.
>> 
>> Writer does this:
>> 1. Creat new json file in the same folder but with a tmp name
>> 2. Rename the file from its tmp name to the public name.
>> 
>> The read will just read the public name.
>> 
>> I am not sure what happens in your world if the writer runs a second time before the data is read.
>> 
>> In that case you need to create a queue of files to be read.
>> 
>> But if the problem is two process racing against each other you MUST use locking.
>> It cannot be avoided for robust operations.
>> 
>> Barry
>> 
>> 
>>> --
>>> https://mail.python.org/mailman/listinfo/python-list
>> 
>> --
>> https://mail.python.org/mailman/listinfo/python-list
> 



More information about the Python-list mailing list