[Tutor] Problem When Iterating Over Large Test Files

William R. Wing (Bill Wing) wrw at mac.com
Thu Jul 19 05:04:37 CEST 2012


On Jul 18, 2012, at 10:33 PM, Ryan Waples wrote:

> Thanks for the replies, I'll try to address the questions raised and
> spur further conversation.
> 
>> "those numbers (4GB and 64M lines) look suspiciously close to the file and record pointer limits to a 32-bit file system.  Are you sure you aren't bumping into wrap around issues of some sort?"
> 
> My understanding is that I am taking the files in a stream, one line
> at a time and never loading them into memory all at once.  I would
> like (and expect) my script to be able to handle files up to at least
> 50GB.  If this would cause a problem, let me know.

[Again, stripping out everything else…]

I don't think you understood my concern.  The issue isn't whether or not the files are being read as a stream, the issue is that at something like those numbers a 32-bit file system can silently fail.  If the pointers that are chaining allocation blocks together (or whatever Windows calls them) aren't capable of indexing to sufficiently large numbers, then you WILL get garbage included in the file stream.

If you copy those files to a different device (one that has just been scrubbed and reformatted), then copy them back and get different results with your application, you've found your problem.

-Bill


More information about the Tutor mailing list