[Python-ideas] hexdump

Sat May 12 18:18:33 CEST 2012

On 5/12/2012 4:59 AM, anatoly techtonik wrote:
> Just an idea of usability fix for Python 3.
> hexdump module (function or bytes method is better) as simple, easy
> and intuitive way for dumping binary data when writing programs in
> Python.
>
> hexdump(bytes)   - produce human readable dump of binary data,
> byte-by-byte representation, separated by space, 16-byte rows

Hexdump, as you propose it, does three things. In each case, it fixes a 
parameter that could reasonably have a different value.

1. Splits the hex characters into groups of two characters, each 
representing one byte. For some uses, large chunks would be more useful.

2. Uppercases the alpha hex characters. This is a holdover from the 
ancient all-uppercase world, where there was no choice. While is may 
make the block visual more 'even' and 'aesthetic', which not actually 
being read, it makes it harder to tell the difference between a 0-9 
digit and alpha digit. B and 8 become very similar. There is 
justification for binascii.hexlify using locecase.

3. Group the hex-represented units into lines of 16 each. This is only 
useful when the bytes come from memory with hex addresses, when the 
point is to determine the specific bytes at specific addresses. For 
displaying decimal-length byte strings, 25 bytes per line would be better.

What it does not do.

4. Break lines into blocks. One might want to break up multiple lines of 
25 into blocks of four lines each.

5. Label the rows and column either with hex or decimal labels.

6. Add 'dotted ascii' translation to reveal embedded ascii strints.

Output: choices are an iterator of lines, a list of lines, and a string 
with embedded newlines. The second and third are easily derived from the 
first, so I propose the first as the best choice. A iterator can also be 
used to write to a file.

A flexible module would be a good addition to pypi if not there already. 
Let see....

hexencoder 1.0
hex encode decode and compare
This project offers 3 basic tools for manipulating binary files: 1) 
flexible hexdump
Home Page: http://sourceforge.net/projects/hexencoder

I did not look to see how flexible is 'flexible', but there it is.

> Rationale:
> 1. Debug.
>      Generic binary data can't be output to console.

That depends on the console. Old IBM PCs had a character for every byte. 
That was meant for line-drawing, accents, and symbols, but could also be 
used for binary dumps. I believe there are Windows codepages that will 
do similar. Any bytes can be decoded as latin-1 and then printed.

 > A separate helper
> is needed to print, log or store its value in human readable format in
> database. This takes time.

A custom helper gives custom output.

> 2. Usability.
>      binascii is ugly: name is not intuitive any more, there are a lot
> of functions, and it is not clear how it relates to unicode.

Even if there are lots of functions, one might be added.
What does 'it' refer to? hexdump or binascii? Both are about binary 
bytes and not about unicode characters, so neither relate to abstract 
unicode. Encoded unicode characters are binary data like any other, 
though if the encoding is utf-16 or utf-32, one would want 2 or 4 bytes 
dumped together, as I suggested above.

-- 
Terry Jan Reedy