Python for large projects

Wed Mar 24 12:07:03 EST 2004

gabor wrote:
> On Wed, 2004-03-24 at 15:16, Bill Rubenstein wrote:
> > ...snip...
> > > > other thing is, that in the projects i work on, there seems to be
> > > > very hard to do unit tests
> > ...snip...
> >
> > The ability to do unit testing should not be an afterthought.  It should be
> > considered as a major influence on the architecture of a project.
> >
> > If one cannot do proper unit testing, the architecture of the project is
> > questionable.
>
> ok, so let's use a specific example:
>
> imagine you're building a library, which fetches webpages.
>
> you have a library which can fetch 1 webpage at a time, but it is a
> synchronous library (like wget). you call him, and he returns the page.
>
> but you want an async one.
>
> so you decide to build a threadpool, where every thread will do this:
> look into a queue, and if there is a new URL to fetch, fetches it with
> his wget-like library, and saves the html page somewhere (and maybe
> signals something).
>
> and now the user who uses your library, simply adds the URL to fetch,
> and can check later asynchronously whether they are already fetched or
> not.
>
> could you tell me what unit tests would you create for this example?

Which unit tests you create depends on the structure of your library and what
APIs it exposes, but FWIW we test stuff like this all the time (although some
people might call our tests involving the other side of the network connection
"system tests" - I don't care as the point is that it detects regression and
ensures correct functionality).

The nightly batch fires up a very simple Python server (we pass in the host and
port to pind to on the command line) and we encode in the URLs commands telling
the server how to respond, e.g.

http://127.0.0.1:4000/cmd_timeout_10/foo.avi # instead of responding, sleep for
10 seconds to test client timeouts
http://127.0.0.1:4000/cmd_mb_1/bar.avi # return a 1 MB file
http://127.0.0.1:4000/cmd_redir_www.google.com/baz.avi # return an HTTP
redirect to www.google.com
http://127.0.0.1:4000/cmd_mb_2/cmd_dropconnafter_500k/bleh.txt # return a 2 MB
response, but drop connection after 500k bytes
http://127.0.0.1:4000/cmd_shutdown/biff.avi # shut down the server

This server is trivial to write - it just splits the URLs by '/' and interprets
any 'cmd_' portions and generates an appropriate response. Anyway, with
something like that in place you can then proceed to test all sorts of things
in your library. A lot of your tests won't require a server at all since they
will be testing how your library reacts to poor inputs, but for the tests that
do require a server you have a block of tests like:

ADDR = '127.0.0.1', 4000
StartListenServer(ADDR) # spawns the server process
# all of your tests that require a remote server to be present
StopListenServer(ADDR)

Whatever your file fetching tool does, you can test.

-Dave