CRC-checksum failed in gzip

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed Aug 1 12:17:44 EDT 2012


On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote:

> Full traceback:
> 
> Exception in thread Thread-8:

"DANGER DANGER DANGER WILL ROBINSON!!!"

Why didn't you say that there were threads involved? That puts a 
completely different perspective on the problem.

I *was* going to write back and say that you probably had either file 
system corruption, or network errors. But now that I can see that you 
have threads, I will revise that and say that you probably have a bug in 
your thread handling code.

I must say, Andrea, your initial post asking for help was EXTREMELY 
misleading. You over-simplified the problem to the point that it no 
longer has any connection to the reality of the code you are running. 
Please don't send us on wild goose chases after bugs in code that you 
aren't actually running.


>   there seems to be no clear pattern and just randmoly fails.

When you start using threads, you have to expect these sorts of 
intermittent bugs unless you are very careful.

My guess is that you have a bug where two threads read from the same file 
at the same time. Since each read shares state (the position of the file 
pointer), you're going to get corruption. Because it depends on timing 
details of which threads do what at exactly which microsecond, the effect 
might as well be random.

Example: suppose the file contains three blocks A B and C, and a 
checksum. Thread 8 starts reading the file, and gets block A and B. Then 
thread 2 starts reading it as well, and gets half of block C. Thread 8 
gets the rest of block C, calculates the checksum, and it doesn't match.

I recommend that you run a file system check on the remote disk. If it 
passes, you can eliminate file system corruption. Also, run some network 
diagnostics, to eliminate corruption introduced in the network layer. But 
I expect that you won't find anything there, and the problem is a simple 
thread bug. Simple, but really, really hard to find.

Good luck.


-- 
Steven



More information about the Python-list mailing list