Manipulate Large Binary Files

James Tanis jtanis at mdchs.org
Wed Apr 2 13:22:12 EDT 2008


"Derek Tracy" <tracyde at gmail.com> wrote:
> 
> INPUT = open(infile, 'rb')
> header = FH.read(169088)
> 
> ary = array.array('H', INPUT.read())
> 
> INPUT.close()
> 
> OUTF1 = open(outfile1, 'wb')
> OUTF1.write(header)
> 
> OUTF2 = open(outfile2, 'wb')
> ary.tofile(OUTF2)
> 
> 
> When I try to use the above on files over 2Gb I get:
>      OverflowError: requested number of bytes is more than a Python string
> can hold
> 
> Does anybody have an idea as to how I can get by this hurdle?
> 

If it were me I'd loop until EOF and do small(er) read/write operations
rather then attempt to put a whole 2gb file into a single string. Even if it
was possible, you'd be using over 2gb of ram for a single operation.

Also INPUT.read() returns a string from what I understand.. ary =
array.array('c', INPUT.read()) might be more appropriate, but I'm not
positive.

Anyway I took a short look through array and using ary.fromfile(f, n) might
be more appropriate. Using a loop, read some "machine values" with
ary.fromfile(f, n) and write them with ary.tofile(f). Catch the EOFError
when it is thrown.. I'd imagine that could work.

--
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="ProgId" content="Word.Document" />
<meta name="Generator" content="Microsoft Word 11" />
<meta name="Originator" content="Microsoft Word 11" /> <!--[if gte mso
9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:PunctuationKerning />
<w:ValidateAgainstSchemas />
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables />
<w:SnapToGridInCell />
<w:WrapTextWithPunct />
<w:UseAsianBreakRules />
<w:DontGrowAutofit />
</w:Compatibility>
<w:DoNotOptimizeForBrowser />
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="156">
</w:LatentStyles>
</xml><![endif]--><style type="text/css">
<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{mso-style-parent:"";
	margin:0in;
	margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:12.0pt;
	font-family:"Times New Roman";
	mso-fareast-font-family:"Times New Roman";}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
	{margin:0in;
	margin-bottom:.0001pt;
	mso-pagination:widow-orphan;
	font-size:10.0pt;
	font-family:"Courier New";
	mso-fareast-font-family:"Times New Roman";}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.25in 1.0in 1.25in;
	mso-header-margin:.5in;
	mso-footer-margin:.5in;
	mso-paper-source:0;}
div.Section1
	
-->
</style><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}
</style>
<![endif]--><span style="">James Tanis<o:p></o:p></span><span style="">
Technology Coordinator<o:p></o:p></span><span style="">
Monsignor Donovan Catholic High School<o:p></o:p></span><span
style=""><o:p> </o:p></span> <span style="">
e: jtanis at mdchs.org<o:p></o:p></span><span style="">
p: (706)433-0223<o:p></o:p></span>





More information about the Python-list mailing list