Simple converter of files into their hex components... but i can't arrange utf-8 parts!

blatt447477 at gmail.com blatt447477 at gmail.com
Sun Jun 9 17:06:58 EDT 2013


Hi all,
I developed a script, which, IMHO, is more useful than the well 
known bash "hexdump". 
Unfortunately i can't arrange very easily the utf-8 encoding,
so in my output there is a loss of synchronization between the  
the literal and the hex part...
The script is not very long but is written not very well (no functions,
no classes...) but I didn't succeed in formulating my doubts in
a more concise way... so here you can find it!

# -*- coding: utf-8 -*-
# px.py          # python 2.6.6
nLenN=3          # n. of digits for lines

# hex conversion on 2 lines (except spaces)
# various run options: std      :             python px.py file
#                      bash cat : cat  file | python px.py (alias hex)
#                      bash echo: echo line | python px.py    "    "

# works on any n. of bytes for utf-8

import os, sys
import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)

try:
    sFN=sys.argv[1]
    f=open(sFN)
    lF=f.readlines()
    f.close()
except:
    sHD=sys.stdin.read().replace('\n','~\n')
    lF=sHD.split('\n')
    for n in xrange(len(lF)):
        lF[n]=lF[n].replace('~','\n')

#################################################################

lP=[]
for n in xrange(len(lF)):

    lP.append(str(n+1).zfill(nLenN)+' '+lF[n])
    lNoSpaces=lF[n].replace(' ','~!').split('!')
    sHexH=sHexL=' ' * nLenN +' '
    for k in xrange(len(lNoSpaces)):
        sHex=lNoSpaces[k].encode('hex')
        sHexNT=sHex.replace('7e','')

        sH=''
        for c in xrange(0,len(sHexNT),2):
            sH += sHexNT[c]
        sHexH += sH+' '

        sL=''
        for c in xrange(1,len(sHexNT),2):
            sL  += sHexNT[c]
        sHexL +=  sL+' '

    lP.append(sHexH+'\n')
    lP.append(sHexL+'\n\n')    # to jump a line

# the insertion of one or more spaces after the unicode characters must be
# done manually on the output (lP)
print ''.join(lP)
#--------------------------------------------------------------

print '---------------------\n'
for n in xrange(0,len(lP),3):
    try:
        lP[n].encode('utf-8')
    except:
        print lP[n],    # to be modified by hand in presence of utf-8 char
        print lP[n+1],  #     to syncronize ascii and hex
        print lP[n+2],

As you see, it is a hex conversion on 2 lines (except spaces), which
has various run options: std      :             python px.py file
                         bash cat : cat  file | python px.py (alias hex)
                         bash echo: echo line | python px.py    "    "

Besides that, it can work (if I solve my problems) on any n. of bytes 
for utf-8.
As an example of such problems, you can compare the output in presence of 
utf-8 chars...

004 # qwerty: not   unicode but   ascii
    2 7767773 667   7666666 677   676660
    3 175249a ef4   5e93f45 254   13399a

005 # qwerty: non è unicode bensì ascii
    2 7767773 666 ca 7666666 6667ca 676660
    3 175249a efe 38 5e93f45 25e33c 13399a

Thanks in advance for any help!
Blatt



More information about the Python-list mailing list