file corruption on windows - possible bug

Duncan Booth duncan.booth at invalid.invalid
Mon May 9 11:26:03 EDT 2005


Jeremy Jones wrote:

> Here is a small example piece of code to
> show the type of thing I'm doing::
> 
> #################################
> file_dict = {}
> 
> a_list = [("a", "a%s" % i) for i in range(2500)]
> b_list = [("b", "b%s" % i) for i in range(2500)]
> c_list = [("c", "c%s" % i) for i in range(2500)]
> d_list = [("d", "d%s" % i) for i in range(2500)]
> 
> 
> joined_list = a_list + b_list + c_list + d_list
> 
> for key, value in joined_list:
>     outfile = file_dict.setdefault(key, open("%s.txt" % key, "w"))
>     outfile.write("%s\n" % value)
> 
> for f in file_dict.values():
>     f.close()
> #################################
> 
> Problem is, when I run this on Windows, I get 14,520 null ("\x00")
> characters at the front of the file and each file is 16,390 bytes long. 

Your call to setdefault is opening the file for writing every time it is 
called, but using only the first handle to write to the file. I presume you 
get a nasty interaction between the file handle you are using to write and 
the other file handles which open the file in a destructive ("w") mode.

The fix is simply to only open each file once instead of 2500 times. e.g. 
(untested code)

for key, value in joined_list:
    if key in file_dict:
        outfile = file_dict[key]
    else:
        outfile = file_dict[key] = open("%s.txt" % key, "w")
    outfile.write("%s\n" % value)



More information about the Python-list mailing list