Python 2.5, problems reading large ( > 4Gbyes) files on win2k

casevh at gmail.com casevh at gmail.com
Sat Mar 3 03:03:06 EST 2007


On Mar 2, 10:09 am, padu... at cisco.com wrote:
> Folks,
>
> I've a Python 2.5 app running on 32 bit Win 2k SP4 (NTFS volume).
> Reading a file of 13 GBytes, one line at a time.  It appears that,
> once the read line passes the 4 GByte boundary, I am getting
> occasional random line concatenations.  Input file is confirmed good
> via UltraEdit.  Groovy version of the same app runs fine.
>
> Any ideas?
>
> Cheers

It appears to be a bug. I am able to reproduce the problem with the
code fragment below. It creates a 12GB file with line lengths ranging
from 0 to 126 bytes, and repeating that set of lines 1500000 times. It
fails on W2K SP4 with both Python 2.4 and 2.5. It works correctly on
Linux (Ubuntu 6.10).

I have reported on SourceForge as bug 1672853.

# Read and write a huge file.
import sys

def write_file(end = 126, loops = 150, fname='bigfile'):
    fh = open(fname, 'w')
    buff = 'A' * end
    for k in range(loops):
        for t in range(end+1):
            fh.write(buff[:t]+'\n')
    fh.close()

def read_file(end = 126, fname = 'bigfile'):
    fh = open(fname, 'r')
    offset = 0
    loops = 0
    for rec in fh:
        if offset != len(rec.strip()):
            print 'Error at loop:', loops
            print 'Expected record length:', offset
            print 'Actual record length:', len(rec.strip())
            sys.exit(0)
        offset += 1
        if offset > end:
            offset = 0
            loops += 1
            if not loops % 10000: print loops
    fh.close()

if __name__ == '__main__':
    write_file(loops=1500000)
    read_file()

casevh




More information about the Python-list mailing list