[summerofcode] SOC networking project idea.

Thu Jun 9 16:42:46 CEST 2005

Hi, this is tim. Im a long time (4yr) python enthusiast and would like
to work on one of my projects through google's SOC. This may be a
little much to post to this group as it probably doubles the total
digest size, hah! but the sf project I just set up hasn't gone through
yet. Anyway, below is a quick description of one of the two ideas that
I submitted. I started working on this program a few months ago and
drafted up all basic functionality (around 4000 lines) but had to
abandon it in interest of studying for finals.

I am quite enthusiastic about completing a beta of this because I
created it to fulfill a need that I had at school. I had several
problems: 1) There is a very low per/interface limit on our ethernet
network at school yet I have three network ports in my room! 2) There
is a very very low per connection limit at my friends school on ssh.
Want to scp over some pictures? Forget it. 3) The school's network is
very unreliable. I have a couple free dial-up accounts in the area, so
it would be great if I could use both at once for redundancy.

After some research, I found that (1) can be readily solved, but it
requires support from the switch, which is not possible for me. For
(2) I could find no generic solution, not to mention most every
filesharing program is banned on the school network. On (3) there is
definitely no solution to bond different connections through totally
different isps. Maybe some of the twisted guys have experience with
these problems? Any suggestions or comments would be great!

Yesterday I put more graphs into graphviz format which I'll put onto
the sf site soon, but if any of the mentors want to know what I'm
really all about, feel free to contact me.

=====================
Project Title
--------------
  A generic TCP stream multiplexer.

Synopsis
---------
  This project will create a special network tunnel that accepts an
incoming connection, splits it's data across a number network tunnels
(ssh/whatever), and reassembles the data from these tunnels on the
other side. This allows bandwidth aggregation, reliability,
redundancy, and increasing cost effectiveness of network utilization.

Benefits to the Python/OSS Community
---------------------------------
  Time and again, in many modern languages, a great networking program
is written but it is not practical due to the nature of threaded
networking. Writing programs in an asynchronous fashion solves this
problem, but many programmers have trouble embracing this technique. I
plan to explore solutions based on poll(), generators, and the twisted
framework. I will then document my design decisions and the
challenging refactoring from limited threading approaches to a variety
of async ones.
  The OSS community in general benefits because this project fills in
a missing gap in the networking toolset. I expect this to fit right in
with other generic tools like ssh, socat, and netcat.
This is a very useful idea in networking and the emerging distributed
networking (p2p) networks in particular, but to my knowledge, has
never been generically implemented as a library or stand-alone
program.

The variety of features of this program will be very beneficial:
 * Bandwidth aggregation across a number of physical links at the
local computer such as across multiple interfaces or modems. In
researching bandwidth aggregation methods, I found that most solutions
involve a low level driver for bonding Ethernet or serial lines,
forget bonding a T1 and backup DSL going through different ISPs unless
you want to do a lot of work. Furthermore these are not user space, on
the fly configurable methods, so many of the possible advantages are
ruled out. This program is much more flexible.
 * Overcome bottlenecks further out on the network. For example, to
saturate discrete network paths, or even increase bandwidth to a host
that has arbitrary per connection bandwidth limits (strange firewall
stuff).
 * Failover and redundancy due to automatic reconnecting of
intermediate tunnels.
 * To cut the cost of networking. For example if a person gets:
   ** A cheap/unreliable/slow connection and a reliable/fast/expensive
connection. The program can be made to favor the cheaper connection,
but if unavailable it uses the other.
   ** Two cheaper lines aggregated instead of one expensive one.
   ** Also, say someone has some network connections. This program
allows someone to add more connections as needed instead of scrapping
everything and upgrading each time.

Deliverables
-------------
  1) A stand-alone program that can easily be used to add redundancy
and increase throughput for any networking task.
  2) An API that can be used by other programs that implements a very
common networking problem and provides fine-grained control over it.

  Future:
   Other stuff listed in the description below.

Project Details
----------------
      The basic idea is really simple. The program/library will
consist of at least one client and server part. An incoming connection
will be accepted by the client, the data from this socket will be
split among a number of intermediate sockets and recombined by the
server at the other end:
                                 tunnels
                             /-------------\
initiating socket -> client ---------------- server -> remote server app
                             \-------------/
example:      port 5555    multiple ports/interfaces  port 4444

This is like a regular unencrypted ssh tunnel, except the data is
divided among many tunnels. It would not be much more difficult to
allow "reverse" connections as well.

Now, the more challenging features, in a rough order that I plan to
implement them are:
  1) The program is almost useless if it makes one-shot tunnels like
netcat. There will have to be socket ids to track multiple sockets
from one port that are going through the tunnel (like socat or ssh).
  2) For the ultimate in scalability, it would be nice to have only
ONE instance of the program running per machine. This program will run
as a daemon and take requests from the local machine to set up new
clients and servers on given ports, along with more advanced commands.
  3) Tying in with #2, I think the best interface for the program
would be with a simple command set over tcp (ala edonkey, mysql), this
will be the backend for basic command line tools which communicate
with this server (msaddtunnel, mssetbandwidthlimit..). Further down
the road, a web front end could be added to manage the tunnels too.
  4) A variety of schedulers could be made. These schedulers would be
the heart of the program that decide how much data to send across
which pipe when. Different ones could be optimized for: low latency,
high throughput, tunnel priority (QOS), low memory usage, low cpu
usage, etc. This is where the real fun comes in, and I'm sure that
with a good framework, there will be many people that suggest
schedulers with other properties.
  5) The program should be able to communicate with multiple remote
servers and especially not trash everything when someone accidentally
adds a tunnel to a different server than all of the other tunnels.
This will necessitate the use of server ids.
  6) There should be a fixed number of sockets through each tunnel
(default 1). Using socket ids and channel ids, multiple ports and
socket connections will be combined together. This will add the
benefit of decreased latency in starting a bunch of short-lived
connections due to the slow ramp-up of TCP. For example, x11 and http.
This what ssh does too.
  7) Allow for tunnel grouping for easier admin, see examples below.
  8) Bandwidth limiting for tunnels and whole channels. Unfortunately
it seems that the most graceful way to do this is for each side to
limit their upload. This must be coordinated.
  9) Maybe, rewrite the critical path in c. This should be a short
task but may not happen for a while because when do you really run out
of fun features to add?
  10) Maybe, rewrite it all in c!?
  11) All all of the other stuff that I forgot to write here.

So, the topology of the final project should look like this:

command line     
admin programs:      manager daemon:
 -----------       ------------------
 |addtunnel|---->-P| <-admin port   |
 -----------  /    |                |
 |deltunnel|-/     |                |                tunnels
 -----------       |  --------------- server socks  /-------:
                   |  | Multiplexing|P--<-------------------: to remote machine
 client socks--->-P|--| API         |         /     \-------: 
                   |  |             |P-->----/
 server output--<-P|--|             |  client socks
                   |  --------------|
                   ------------------
                           ^
                           |
                        The API implements the client and server.

Notes: The arrows indicate the direction in which the connections are
initiated, but data flows in both directions on all sockets. There is
one admin port, and zero or more ports of all other types of
connections.

Possible command line examples:
 Here is the rough idea. I can provide more refined examples once I
dig out my notes.

 # msaddtunnels --names=joe,frank,bob --tunnels=joe.com:3456,frank.ru:789,bob:6
 # mslimittunnel joe 50
 # mspriotunnel frank 1
 # mspriotunnel joe 15
 # mscreategroup -n dsl_users -t joe,frank
 # mssetdownloadlimit -g dsl_users 100 

Project Schedule
-----------------
  One person will be able to bring this project to a usable state
within about one month. There is plenty of room for additional
features and modifications in the remaining time. My summer plans are
set so I will be able to begin work immediately and continue for the
entire summer.

Bio
----
  I am a student majoring in EE at Norwich University. This Fall I
will go into my junior year. My prior programming experience ranges
from medium comfort in: Saturn assembly on my HP48, c, and pushing the
limits in: bash, java, and python. I am also interested in crypto,
functional programming, error correction algorithms, light graph
theory, and am a regular reader of LTE. I began using linux with Red
Hat around 1998, distro hopped, and finally settled on gentoo in 2001.
My primary desktop runs on an opteron 240 which I got shortly after
their release. I still have fond memories of hacking numerous programs
to get them working in 64 bit.