Python v.s. huge files PROBLEM!!

djw dwelch91 at nospam.attbi.com
Fri Jul 19 10:50:11 EDT 2002


Jose Rivera wrote:
> Hi..
> Scenario:
> OS      : WinNT 4.0
> FileName: RESPALDO_MENSUAL_Data.MDF
> Size    : 243,386,941,440 bytes
> 
> Problem:
> I want to copy this file to another disk. Both disks have 500 GB of
> free space.
> 
> Microsoft problem:
>    You May Not Be Able to Copy Large Files on Computers That Are
> Running Windows NT 4.0 or Windows 2000 (Q259837)
> 
> Workaround suggested by Microsoft:
>    Use Backup / Restore utilities
>   Result: They didn't work either... using HP OmniBack
> 
> Workaround made by us:
> Make a python program that read and writes to the other file in
> theother disk.
> 
> Python Code:
> 
> import sys
> 
> if len(sys.argv) != 3:
> 	print 'Format:'
> 	print '\t pyCopy.exe SourceFile EndFile'
> else:
> 	fn1=sys.argv[1]
> 	fn2=sys.argv[2]
> 	f1=open(fn1,'rb')
> 	f2=open(fn2,'wb')
> 	data=f1.read(1024*1000)
> 	while data:
> 		f2.write(data)
> 		data=f1.read(1024*1000)
> 	f1.close()
> 	f2.close()
> 
> Result:
> IOError: [Errno 22] Invalid argument
> 
> Question:
> Is there anything wrong?



If buffered I/O doesn't work, how about non-buffered I/O?

According to MSDN:

Use CreateFile for Non-Buffered File I/O
If your application performs file input or output without using the 
intermediate buffering or caching provided by the system, the 
application must call CreateFile with the FILE_FLAG_NO_BUFFERING flag 
set when opening the file. In this case, your application must pass a 
buffer to ReadFile or WriteFile that is correctly aligned for the 
device. Note that the alignments changed for some devices with Window 
2000. For more information, see the FILE_FLAG_NO_BUFFERING description 
in the CreateFile section and the VirtualAlloc section.
One way to align buffers on integer multiples of the volume sector size 
is to use VirtualAlloc to allocate the buffers. This function allocates 
memory that is aligned on addresses that are integer multiples of the 
operating system's memory page size. Because both memory page and volume 
sector sizes are powers of 2, this memory is also aligned on addresses 
that are integer multiples of a volume's sector size. Your application 
must make sure that it reads and writes in multiples of the actual 
sector size of the input or output device. An application can determine 
a volume's sector size by calling the GetDiskFreeSpaceEx function.


So how about some code that looks vaguely like this (only tested a 
little bit, no error checking and no guarantees!)
(Note the use of win32con.FILE_FLAG_NO_BUFFERING):


import win32file, win32con, win32api

fn1=sys.argv[1]
fn2=sys.argv[2]

f1 = win32file.CreateFile( fn1,
                            win32con.GENERIC_READ,
                            win32con.FILE_SHARE_READ,
                            None,
                            win32con.OPEN_EXISTING,
                            win32con.FILE_FLAG_NO_BUFFERING,
                            0)

f2 = win32file.CreateFile( fn2,
                            win32con.GENERIC_WRITE,
                            win32con.FILE_SHARE_WRITE,
                            None,
                            win32con.CREATE_ALWAYS,
                            win32con.FILE_FLAG_NO_BUFFERING,
                            0)

# bad assumption of using C: in next line! Need to change...
spc, bps, fc, tc  = win32file.GetDiskFreeSpace( "c:\\" )
bpc = spc * bps # = 4096 on my XP box

while 1:

     hr, r1 = win32file.ReadFile( f1,
                                  bpc )

     e, bw  = win32file.WriteFile( f2,
                                   r1 )

     if bw == 0: break

win32api.CloseHandle( f1 )
win32api.CloseHandle( f2 )


Worked on my system for a 19Mb file...  way smaller than yours, but
I don't have many 243Gb files laying around... in fact this was the
largest file on my harddrive!

I could not figure out how to tell when ReadFile() was complete. The 
param that is usually passed back from the Win32API - 
lpNumberOfBytesRead  - is not returned by the Python wrapping of the 
function (don't know why not). However, the check for bytes written 
(returned by win32file.WriteFile() seems to do the trick.

Also, note that I used GetDiskFreeSpace(), not the Ex version. I think
there is an error in MS's docs - the Ex version doesn't return the
cluster sizes and such like the non-Ex version.

Regards,

Don




More information about the Python-list mailing list