Learning python networking

Wed Jan 15 07:52:44 EST 2014

On Wed, Jan 15, 2014 at 9:37 PM, Paul Pittlerson <menkomigen6 at gmail.com> wrote:
> I'm sorry if this is a bit late of a response, but here goes.
>
> Big thanks to Chris Angelico for his comprehensive reply, and yes, I do have some questions!

Best way to learn! And the thread's not even a week old, this isn't
late. Sometimes there've been responses posted to something from
2002... now THAT is thread necromancy!!

>> On Thursday, January 9, 2014 1:29:03 AM UTC+2, Chris Angelico wrote:
>> Those sorts of frameworks would be helpful if you need to scale to
>> infinity, but threads work fine when it's small.
> That's what I thought, but I was just asking if it would be like trivially easy to set up this stuff in some of those frameworks. I'm sticking to threads for now because the learning curve of twisted seems too steep to be worth it at the moment.

I really don't know, never used the frameworks. But threads are fairly
easy to get your head around. Let's stick with them.

>> Absolutely! The thing to look at is MUDs and chat servers. Ultimately,
>> a multiplayer game is really just a chat room with a really fancy
>> front end.
> If you know of any open source projects or just instructional code of this nature in general I'll be interested to take a look. For example, you mentioned you had some similar projects of your own..?
>

Here's something that I did up as a MUD-writing tutorial for Pike:

http://rosuav.com/piketut.zip

I may need to port that tutorial to Python at some point. In any case,
it walks you through the basics. (Up to section 4, everything's the
same, just different syntax for the different languages. Section 5 is
Pike-specific.)

>> The server shouldn't require interaction at all. It should accept any
>> number of clients (rather than getting the exact number that you
>> enter), and drop them off the list when they're not there. That's a
>> bit of extra effort but it's hugely beneficial.
> I get what you are saying, but I should mention that I'm just making a 2 player strategy game at this point, which makes sense of the limited number of connections.

One of the fundamentals of the internet is that connections *will*
break. A friend of mine introduced me to Magic: The Gathering via a
program that couldn't handle drop-outs, and it got extremely
frustrating - we couldn't get a game going. Build your server such
that your clients can disconnect and reconnect, and you protect
yourself against half the problem; allow them to connect and kick the
other connection off, and you solve the other half. (Sometimes, the
server won't know that the client has gone, so it helps to be able to
kick like that.) It might not be an issue when you're playing around
with localhost, and you could even get away with it on a LAN, but on
the internet, it's so much more friendly to your users to let them
connect multiple times like that.

>> One extremely critical point about your protocol. TCP is a stream -
>> you don't have message boundaries. You can't depend on one send()
>> becoming one recv() at the other end. It might happen to work when you
>> do one thing at a time on localhost, but it won't be reliable on the
>> internet or when there's more traffic. So you'll need to delimit
>> messages; I recommend you use one of two classic ways: either prefix
>> it with a length (so you know how many more bytes to receive), or
>> terminate it with a newline (which depends on there not being a
>> newline in the text).
> I don't understand. Can you show some examples of how to do this?

Denis gave a decent explanation of the problem, with a few
suggestions. One of the easiest to work with (and trust me, you will
LOVE the ease of debugging this kind of system) is the line-based
connection. You just run a loop like this:

buffer = b''

def gets():
    while '\n' not in buffer:
        data = sock.recv(1024)
        if not data:
            # Client is disconnected, handle it gracefully
            return None # or some other sentinel
    line, buffer = buffer.split(b'\n',1)
    return line.decode().replace('\r', '')

You could put this into a class definition that wraps up all the
details. The key here is that you read as much as you can, buffering
it, and as soon as you have a newline, you return that. This works
beautifully with the basic TELNET client, so it's easy to see what's
going on. Its only requirement is that there be no newlines *inside*
commands. The classic MUD structure guarantees that (if you want a
paragraph of text, you have some marker that says "end of paragraph" -
commonly a dot on a line of its own, which is borrowed from SMTP), and
if you use json.dumps() then it'll use two characters "\\" and "n" to
represent a newline, so that's safe too.

The next easiest structure to work with is length-preceded, which
Denis explained. Again, you read until you have a full packet, but
instead of "while '\n' not in buffer", it would be "while
len(buffer)<packetlen". Whichever way you mark packets, you have to be
prepared for both problems: incomplete packets, and packets merged.
And both at once, too - one recv() call might return the tail of one
and the beginning of another.

Note that I've written the above example with the intent that it'll
work on either Python 2 or Python 3 (though it's not actually tested
on either). The more code you can do that way, the easier it'll be...
see next point.

>> Another rather important point, in two halves. You're writing this for
>> Python 2, and you're writing with no Unicode handling. I strongly
>> recommend that you switch to Python 3 and support full Unicode.
> Good point, however the framework I'm using for graphics does not currently support python3. I could make the server scripts be in python3, but I don't  think the potential confusion is worth it until the whole thing can be in the same version.
>

Hmm. Maybe, but on the flip side, it might be better to first learn
how to do things in Py3, and then work on making your code able to run
in Py2. That way, you force yourself to get everything right for Py3,
and then there's no big porting job to move. Apart from the fact that
you have to learn two variants of the language, there's nothing
stopping the server being Py3 while the client's in Py2.

>> Note, by the way, that it's helpful to distinguish "data" and "text",
>> even in pseudo-code. It's impossible to send text across a socket -
>> you have to send bytes of data. If you keep this distinction clearly
>> in your head, you'll have no problem knowing when to encode and when
>> to decode. For what you're doing here, for instance, I would packetize
>> the bytes and then decode into text, and on sending, I'd encode text
>> (UTF-8 would be hands-down best here) and then packetize. There are
>> other options but that's how I'd do it.
> I'm not sure what you are talking about here. Would you care to elaborate on this please (it interests and confuses) ?

Fundamentally, network protocols work with bytes. (Actually, they're
often called 'octets' - and a good number of protocols work with
*bits*, of which an octet is simply a group of eight.) They don't work
with characters. But humans want to send text. I'll use these posts as
an example.

1) Human types text into mail client, newsreader, or web browser.
2) Client wraps that text up somehow and sends it to a server.
3) Server sends it along to other servers.
4) Server sends stuff to another client.
5) Client unwraps the text and shows it to a human.

For there to be viable communication, the text typed in step 1 has to
be the same as the text shown in step 5. So we need to have some kind
of system that says "This character that I see on my screen is
represented by this byte or sequence of bytes". That's called an
encoding. It's simply a mapping of characters to byte sequences.
(There are other complexities, too, but I'll handwave those for the
moment. For now, let's pretend that one character that you see on the
screen - more properly termed a glyph - is represented by one stream
of bytes.) The best way to handle this is Unicode, because it's fairly
safe to assume that most people can handle it. So you design your
protocol to use Unicode characters and UTF-8 encoding. That means
that:

* The character 'A' (LATIN CAPITAL LETTER A) is represented by code
point U+0041 and byte sequence 0x41
* The character '©' (COPYRIGHT SIGN) is code point U+00A9 or 0xC2 0xA9
* '☺' (WHITE SMILING FACE) is U+263A or 0xE2 0x98 0xBA
* '𒍅' (CUNEIFORM SIGN URU TIMES KI) is U+12345 or 0xF0 0x92 0x8D 0x85

(Tip: http://www.fileformat.info/ is a great place to play around with
these sorts of things.)

An agreement like this means that one human can type a white smiling
face, his client will interpret it as U+263A, the email and news posts
will contain E2 98 BA, and the human at the other end will see a white
smiling face. There's more to it than that, at least with these posts,
because not everyone uses UTF-8 (so the encoding has to be declared),
but if you're creating a brand new protocol, you can simply mandate
it. I strongly recommend UTF-8, by the way; it's compact for text
that's mostly Latin characters, it's well known, and it covers the
entire Unicode range (unlike, say, CP-1252 as used on Windows, or the
ISO-8859-? series).

Since you're working with JSON, you could choose to work with ASCII,
as JSON has its own notation for incorporating non-ASCII characters in
an ASCII stream. But I think it's probably better to use UTF-8.

One of the huge advantages of Python 3 over Python 2 is that it forces
you to think about this up-front. There is a stark divide between
bytes and text. In Python 2, you can sorta pretend that ASCII text and
bytes are the same thing, which often leads to programs that work
perfectly until they get to a "weird character". Fact is, there are no
weird characters. :) I recommend this talk by Ned Batchelder:

http://nedbatchelder.com/text/unipain.html

Watch it, comprehend it, and code with his Facts of Life and Pro Tips
in mind, and you'll have no pain.

> I'm posting this on google groups, so I hope the formatting turns out ok :P thanks.

Your lines are coming out extremely long, but the biggest annoyance of
GG (double-spaced replies) isn't happening. Thank you.

ChrisA