Hex editor display - can this be more pythonic?
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Sun Jul 29 15:53:55 EDT 2007
On Sun, 29 Jul 2007 12:24:56 -0700, CC wrote:
> ln = '\x00\x01\xFF 456\x0889abcde~'
> import sys
> for c in ln:
> sys.stdout.write( '%.2X ' % ord(c) )
>
> or this:
>
> sys.stdout.write( ' '.join( ['%.2X' % ord(c) for c in ln] ) + ' ' )
>
> Either of these produces the desired output:
>
> 00 01 FF 20 34 35 36 08 38 39 61 62 63 64 65 7E
>
> I find the former more readable and simpler. The latter however has a
> slight advantage in not putting a space at the end unless I really want
> it. But which is more pythonic?
I would use the second with fewer spaces, a longer name for `ln` and in
recent Python versions with a generator expression instead of the list
comprehension:
sys.stdout.write(' '.join('%0X' % ord(c) for c in line))
> The next step consists of printing out the ASCII printable characters.
> I have devised the following silliness:
>
> printable = '
> 1!2 at 3#4$5%6^7&8*9(0)aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ\
> `~-_=+\\|[{]};:\'",<.>/?'
I'd use `string.printable` and remove the "invisible" characters like '\n'
or '\t'.
> for c in ln:
> if c in printable: sys.stdout.write(c)
> else: sys.stdout.write('.')
>
> print
>
> Which when following the list comprehension based code above, produces
> the desired output:
>
> 00 01 FF 20 34 35 36 08 38 39 61 62 63 64 65 7E ... 456.89abcde~
>
> I had considered using the .translate() method of strings, however this
> would require a larger translation table than my printable string.
The translation table can be created once and should be faster.
> I'd like to display the non-printable characters differently, since they
> can't be distinguished from genuine period '.' characters. Thus, I may
> use ANSI escape sequences like:
>
> for c in ln:
> if c in printable: sys.stdout.write(c)
> else:
> sys.stdout.write('\x1B[31m.')
> sys.stdout.write('\x1B[0m')
>
> print
`re.sub()` might be an option here.
> I'm also toying with the idea of showing hex bytes together with their
> ASCII representations, since I've often found it a chore to figure out
> which hex byte to change if I wanted to edit a certain ASCII char. Thus,
> I might display data something like this:
>
> 00(\0) 01() FF() 20( ) 34(4) 35(5) 36(6) 08(\b) 38(8) 39(9) 61(a) 62(b)
> 63(c) 64(d) 65(e) 7E(~)
>
> Where printing chars are shown in parenthesis, characters with Python
> escape sequences will be shown as their escapes in parens., while
> non-printing chars with no escapes will be shown with nothing in parens.
For escaping:
In [90]: '\n'.encode('string-escape')
Out[90]: '\\n'
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list