python implementation of a new integer encoding algorithm.

Chris Angelico rosuav at gmail.com
Tue Feb 17 08:16:07 EST 2015


On Tue, Feb 17, 2015 at 10:22 PM,  <janhein.vanderburg at gmail.com> wrote:
> In http://optarbvalintenc.blogspot.nl/ I propose a new way to encode arbitrarily valued integers and the python code that can be used as a reference for practical implementations of codecs.
>
> The encoding algorithm itself is optimized for transmission and storage requirements without any processor load consideration whatsoever.
>
> The next step is the development of the python code that minimizes processor requirements without compromising the algorithm.
>
> Is this the right forum to ask for help with this step or to obtain references to more suitable platforms?

This is a fine forum to ask in. However, you may find that the advice
you get isn't quite what you were asking for. In my case, the advice
I'm offering is: Don't do this. In over 99% of cases, the benefit from
this kind of packed transmission format is negligible; it's much MUCH
better to make your protocol simple readable text. Have a look at
internet protocols like SMTP/POP/IMAP (the email trio), HTTP, FTP, and
so on; all of them follow a basic pattern of connecting to a
well-known port on a destination server, sending textual commands
terminated by end-of-line, and receiving textual responses terminated
by end-of-line. Most of the world wide web consists of HTTP requests
(possibly using SSL/TLS, but that doesn't change the protocol), but
even so, the advantage of packing the headers into a binary format
just isn't worth the cost of making everything harder to debug.

Because debugging is *hugely* easier when you have a text protocol.
All you need to do is telnet or netcat to the appropriate port, type
some commands, and eyeball the responses. I have a MUD client called
Gypsum [1] which does that, with a few extra features, including (like
netcat, but unlike telnet) listening on a port, so you can test a
client; I've used it and its predecessor for testing myriad networking
programs, both my own and other people's, to try to figure out what's
going on.

So if you want to develop a brand new protocol, here's what I'd suggest:

1) Mandate UTF-8 encoding everywhere
2) Stipulate \n as the end of line, and strip/ignore all \r found
3) Follow the basic model laid down by SMTP and POP: the server sends
a greeting on startup, then everything's done with simple commands and
responses. The server may also send unilateral messages, as long as
they're unambiguously detectable (eg if it's notifying you of
something that just happened).

Sure, you might be able to pack your integers into something more
compact than their decimal representations... but how much will you
really gain? Most of the internet works on the basis of packets, and
you'll find that the difference between a 200-byte packet and a
210-byte packet probably isn't even measurable; at very best, you
might see an advantage when you transfer a huge file as a coherent
blob, but that would mean dealing with a large number of these
packetized integers. In the meantime, you make your protocol fragile
and hard to read by eye, and you spend a lot of time developing your
protocol, instead of just blatting simple text down the wire. Take the
easy option; you can always make things more complicated later.

[1] https://github.com/Rosuav/Gypsum

ChrisA



More information about the Python-list mailing list