Shrinky-dink Python (also, non-Unicode Python build is broken)

Larry Hastings larry at hastings.org
Mon Jan 16 13:19:19 EST 2006


I'm an indie shareware Windows game developer.  In indie shareware
game development, download size is terribly important; conventional
wisdom holds that--even today--your download should be 5MB or less.

I'd like to use Python in my games.  However, python24.dll is 1.86MB,
and zips down to 877k.  I can't afford to devote 1/6 of my download
to just the scripting interpreter; I've got music, and textures, and
my own crappy code to ship.

Following a friend's suggestion, as an experiment I downloaded the
Python 2.4.2 source, then set about stripping out everything I could.
I removed:
 * Unicode support, including the CJK codecs
 * All doc strings
 * *Every* module written in C
Now when I build, python24.dll is 570k, and zips down to about 260k.
But I learned some things on the way.


First and foremost: turning off Py_USING_UNICODE *breaks the build*
on Windows.  The following list of breakages were all fixed with
judicious applications of #ifdef Py_USING_UNICODE:
* The implementation of "multi-byte codecs" (CJK codecs) implicitly
  assumes that they can use all the Unicode facilities.  So all the
  files in "Modules/cjkcodecs" fail to build.
* Obviously, the Unicode string object depends on Unicode support,
  so Objects/unicode* doesn't build.
* There are several spots in the code that need to handle Unicode
  strings in some slightly special way, and assume Unicode is turned
  on.  E.g.:
    * Modules/posixmodule.c, posix__getfullpathname(), line 1745
	* same file, posix_open(), starting on line 5201
	* Objects/fileobject.c, open_the_file(), starting on line 158
	* _winreg.c, Py2Reg(), starting on lines 724 and 777

In addition, there was one slightly more complicated problem: _winreg.c
assumes it should call PyUnicode_DecodeMBCS() to turn strings pulled
from the registry into Unicode strings.  I'm not sure what the correct
thing to do here is; I went with changing the calls from
PyUnicode_DecodeMBCS() to PyString_FromStringAndSize() for non-Unicode
builds.

Of course, it's not the most important thing in the world--after all,
I'm the first person to even *notice*, right?  But it seems a shame
that
one can break the build so easily.  If it pleases the stewards of
Python, I would be happy to submit patches that fix the non-"using
Unicode" build.


Second of all, the dumb-as-a-bag-of-rocks Windows linker (at least
the one used by VC++ under MSVS .Net 2003) *links in unused static
symbols*.  If I want to excise the code for a module, it is not
sufficient to comment-out the relevant _inittab line in config.c.
Nor does it help if I comment out the "extern" prototype for the
init function.  As far as I can tell, the only way to *really* get
rid of a module, including all its static functions and static data,
is to actually *remove all the code* (with comments, or #if, or
whatnot).  What a nosebleed, huh?

So in order to build my *really* minimal python24.dll, I have to hack
up the source something fierce.  It would be pleasant if the Python
source code provided an easy facility for turning off modules at
compile-time.  I would be happy to propose something / write a PEP
/ submit patches to do such a thing, if there is a chance that such
a thing could make it into the official Python source.  However, I
realize that this has terribly limited appeal; that, and the fact
that Python releases are infrequent, makes me think it's not a
terrible hardship if I had to re-hack up each new Python release
by hand.


Whatcha think, froods?


/larry/




More information about the Python-list mailing list