[SPAM-Bayes] - Re: Converting IBM Floats..Help.. - Bayesian Filter detected spam

Ian Sparks Ian.Sparks at etrials.com
Thu Mar 25 14:17:16 EST 2004


Jeff,

Thanks so much for this, it works fine on the rest of my data too. I'm going to have to create code that will do the reverse conversion too so I want to make sure I understand what happened here so I can apply the same concepts. I have some instructions on the reverse-process and I believe its mostly bit-shifting.

This is somewhat "teach a man a fish" so I appreciate you're bearing with me on this basic CS stuff. I think I'm nearly there...

def ibm360_decode(s):

    #Get element 0 of the unpack tuple
    #struct format : > = Big Endian, Q means unsigned long long 
    #not sure why we need an unsigned long long ? 
    #think its because we need that to do bitwise operations?

    l = struct.unpack(">Q", s)[0]     

    #The sign is the first bit, we have 8 bytes = 64 bits. 
    #So shift the bits 63 to the right dropping everything but our sign bit off the end

    sign = l >> 63

    #Our exponent is 7 bits positioned 57-63
    #So we bit shift down 56 digits to drop off everything that isn't part of the exponent 

    exponent_with_sign = (l >> 56)

    #But we still have our sign-bit at the left side so we've got 8 digits
    #we need to cut off that extra digit
    #we can do this using the bitwise & to compare our number against 00111111 
    # (0x7f - 64 == 127 - 64 == 63 = 111111)
    # e.g. for 7 with a positive sign we'd have
    # 7  = 10000111    (looks like 135)
    # 63 = 00111111 &
    #      --------
    #       0000111 = 7
    #  
    # But I'm clearly missing something here because 63 is only 6 binary-digits long and we need
    # to cut off the 8th...127 would seem to be more appropriate (but doesn't work). Umph ?

    exponent = exponent_with_sign & 0x7f - 64


    #Ok, moving on. 
    #The mantissa is the last 56 bits (reading from left->right) so we need to cut off the first
    #8 bits which means ANDing (&) with a large number representing all 1's for 56 of those bits
    #and zero for the others e.g. 00000000111111...1111

    #The / by 16. ** 14	has me scratching my head, I admit. Our mantissa is firmly planted in the
    #right hand side of our binary number isn't it? But we're dividing by a massive number....?

    mantissa = (l & ((1L<<56) - 1)) / (16. ** 14)

    #The instructions said the true exponent is 16 * the exponent value we extracted, again, not
    #sure why we're multiplying the mantissa up too?

    return [1,-1][sign] * (16**exponent) * mantissa


-----Original Message-----
From: Jeff Epler [mailto:jepler at unpythonic.net]
Sent: Thursday, March 25, 2004 11:39 AM
To: Ian Sparks
Cc: Python-List at Python. Org (E-mail)
Subject: [SPAM-Bayes] - Re: Converting IBM Floats..Help.. - Bayesian
Filter detected spam


Note that Python floats are C doubles, which generally have a 53-bit
mantissa.  This means that not all ibm360 floats can be exactly
represented.  The exponent ranges may also not match up.  Finally, I
have no idea if ibm360 floats have special representations for cases
like denormal, NaN, infinity, etc. so these aren't handled.

Basically, the approach is to pull the bits out of the original string
and then do some arithmetic to turn them into floats.

I don't know anything about ibm360 format, but I followed your
description, and the test passes..

import struct

def ibm360_decode(s):
    l = struct.unpack(">Q", s)[0]
    sign = l >> 63
    exponent = (l >> 56) & 0x7f - 64
    mantissa = (l & ((1L<<56) - 1)) / (16. ** 14)
    return [1,-1][sign] * (16**exponent) * mantissa

def test():
    vectors = [
        (155, 'B\x9b\x00\x00\x00\x00\x00\x00'),
        (77, 'BM\x00\x00\x00\x00\x00\x00'),
        (1, 'A\x10\x00\x00\x00\x00\x00\x00'),
        (0, '\x00\x00\x00\x00\x00\x00\x00\x00'),
    ]

    for v, s in vectors:
        d = ibm360_decode(s)
        print v, d, `s`
        assert d == v

if __name__ == '__main__': test()




More information about the Python-list mailing list