file corruption on windows - possible bug

Jeremy Jones zanesdad at bellsouth.net
Mon May 9 10:54:22 EDT 2005


I've written a piece of code that iterates through a list of items and
determines the filename to write some piece of data to based on
something in the item itself.  Here is a small example piece of code to
show the type of thing I'm doing::

#################################
file_dict = {}

a_list = [("a", "a%s" % i) for i in range(2500)]
b_list = [("b", "b%s" % i) for i in range(2500)]
c_list = [("c", "c%s" % i) for i in range(2500)]
d_list = [("d", "d%s" % i) for i in range(2500)]


joined_list = a_list + b_list + c_list + d_list

for key, value in joined_list:
    outfile = file_dict.setdefault(key, open("%s.txt" % key, "w"))
    outfile.write("%s\n" % value)

for f in file_dict.values():
    f.close()
#################################

Problem is, when I run this on Windows, I get 14,520 null ("\x00")
characters at the front of the file and each file is 16,390 bytes long. 
When I run this script on Linux, each file is 13,890 bytes and contains
no "\x00" characters.  This piece of code::

#################################
import cStringIO

file_dict = {}

a_list = [("a", "a%s" % i) for i in range(2500)]
b_list = [("b", "b%s" % i) for i in range(2500)]
c_list = [("c", "c%s" % i) for i in range(2500)]
d_list = [("d", "d%s" % i) for i in range(2500)]


joined_list = a_list + b_list + c_list + d_list

for key, value in joined_list:
    #outfile = file_dict.setdefault(key, open("%s.txt" % key, "w"))
    outfile = file_dict.setdefault(key, cStringIO.StringIO())
    outfile.write("%s\n" % value)

for key, io_string in file_dict.items():
    outfile = open("%s.txt" % key, "w")
    io_string.seek(0)
    outfile.write(io_string.read())
    outfile.close()
#################################

results in files containing 16,390 bytes and no "\x00" characters on
Windows and 13,890 bytes on Linux and no "\x00" characters (file size
difference on Windows and Linux is due to line ending).  I'm still doing
a setdefault on the dictionary to create an object if the key doesn't
exist, but I'm using a cStringIO object rather than a file object.  So,
I'm treating this just like it was a file and writing it out later.

Does anyone have any idea as to why this is writing over 14,000 "\x00"
characters to my file to start off with where printable characters
should go and then writing the remainder of the file correctly? 


Jeremy Jones



More information about the Python-list mailing list