File Read issue by using module binascii

Sun Apr 28 09:32:49 EDT 2013

On Sunday, April 28, 2013 8:04:04 PM UTC+8, Jens Thoms Toerring wrote:
> Tim Roberts <timr at probo.com> wrote:
> 
> > Jimmie He <jimmie.he at gmail.com> wrote:
> 
> 
> 
> > >When I run the readbmp on an example.bmp(about 100k),the Shell is become to "No respose",when I change f.read() to f.read(1000),it is ok,could someone tell me the excat reason for this?
> 
> > >Thank you in advance!
> 
> > >
> 
> > >Python Code as below!!
> 
> > >
> 
> > >import binascii
> 
> > >
> 
> > >def read_bmp():
> 
> > >    f = open('example.bmp','rb')
> 
> > >    rawdata = f.read()                       #f.read(1000) is ok
> 
> > >    hexstr = binascii.b2a_hex(rawdata)       #Get an HEX number
> 
> > >    bsstr = bin (int(hexstr,16))[2:]
> 
> 
> 
> > I suspect the root of the problem here is that you don't understand what
> 
> > this is actually doing.  You should run this code in the command-line
> 
> > interpreter, one line at a time, and print the results.
> 
> 
> 
> > The "read" instruction produces a string with 100k bytes.  The b2a_hex then
> 
> > produces a string with 200k bytes.  Then, int(hexstr,16) takes that 200,000
> 
> > byte hex string and converts it to an integer, roughly equal to 10 to the
> 
> > 240,000 power, a number with some 240,000 decimal digits.  You then convert
> 
> > that integer to a binary string.  That string will contain 800,000 bytes.
> 
> > You then drop the first two characters and print the other 799,998 bytes,
> 
> > each of which will be either '0' or '1'.
> 
> 
> 
> > I am absolutely, positively convinced that's not what you wanted to do.
> 
> > What point is there in printing out the binary equavalent of a bitmap?
> 
> 
> 
> > Even if you did, it would be much quicker for you to do the conversion one
> 
> > byte at a time, completely skipping the conversion to hex and then the
> 
> > creation of a massive multi-precision number.  Example:
> 
> 
> 
> >     f = open('example.bmp','rb')
> 
> >     rawdata = f.read()
> 
> >     bsstr = []
> 
> >     for b in rawdata:
> 
> >         bsstr.append( bin(ord(b)) )
> 
> >     bsstr = ''.join(bsstr)
> 
> 
> 
> > or even:
> 
> >     f = open('example.bmp','rb')
> 
> >     bsstr = ''.join( bin(ord(b))[2:] for b in f.read() )
> 
> 
> 
> Exactly my idea at first. But then I started to time it (using
> 
> the timeit module) by comparing the following functions:
> 
> 
> 
>   # Original version
> 
>   
> 
>   def c1( rawdata ) :
> 
>       h = binascii.b2a_hex( rawdata )
> 
>       z = bin( int( h, 16 ) )[ 2 : ]
> 
>       return '0' * ( 8 * len( r ) - len( z ) ) + z
> 
> 
> 
>   # Convert each byte directly
> 
> 
> 
>   def c2( rawdata ) :
> 
>       return ''.join( bin( ord( x ) )[ 2 : ].rjust( 8, '0' ) for x in r )
> 
> 
> 
>   # Convert each byte using a list for table look-up
> 
> 
> 
>   def c3( rawdata ) :
> 
>       h = [ bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) ]
> 
>       return ''.join( h[ ord( x ) ] for x in rawdata )
> 
> 
> 
>   # Convert each byte using a dictionary for table look-up (avoids
> 
>   # lots of ord() calls)
> 
> 
> 
>   def c4( rawdata ) :
> 
>       h = { chr( i ) : bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) }
> 
>       return ''.join( h[ x ] for x in rawdata )
> 
> 
> 
> As you can see I even in c3() and c4() tried to speed things up
> 
> further by using a table look-up instead if calling bin() etc.
> 
> on each byte. But the results was that c2() is nearly 15 times
> 
> slower than c1(), c3() about 3 times and c4() still more than 2
> 
> times slower! So the method the OP uses seems to be quite a bit
> 
> more efficient than one might be tempted to assume.
> 
> 
> 
> I would guess that the reason is that c1() does just a small
> 
> number of calls of functions that probably aren't implemented
> 
> in Python but in C and thus can be a lot faster then anything
> 
> you could achieve with Python, while the other functions use a
> 
> for loop in Python, which seems to account for a good part of
> 
> the CPU time used. To test for that I split the 'rawdata' string
> 
> into a list of character (i.e. single letter strings) and re-
> 
> assembled it using join() and a for loop:
> 
> 
> 
>     r = list( rawdata( )
> 
>     z = ''.join( x for x in r )
> 
> 
> 
> The second line alone took about 1.7 times longer than the
> 
> whole, seemingly convoluted c1() function!
> 
> 
> 
> What I take away from this is that a lot of the assumption one
> 
> is prone to make when coming from e.g. a C/C++ background can
> 
> be quite misleading when extrapolating to Python (or other in-
> 
> terpreted languages)...
> 
>                           Best regards, Jens
> 
> -- 
> 
>   \   Jens Thoms Toerring  ___      jt at toerring.de
> 
>    \__________________________      http://toerring.de
Hi,Jens &Peter &Tim,
   Thank you very much for your wonderful analysis for my newbie question.
   I admit that I throw this question to much early because I just want some guru to inspire me;-) If it really confuse you,excuse my noise:-)
   What I intend to do is to make an BMP Font Maker(Covert the BMP to an data array,what I did wrong is print it directly to screen and had not understand it at all firstly.
   C1()~C4() which Jens provided deeply indicate that we should think about the effiency because it is an interpreted language.
   Anyway thanks for all your kindly help :-)