[Patches] [ python-Patches-445762 ] Support --disable-unicode

Fri, 17 Aug 2001 11:44:34 -0700

Patches item #445762, was opened at 2001-07-29 14:13
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=445762&group_id=5470

Category: Build
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Support --disable-unicode

Initial Comment:
This patch implements the option --disable-unicode.
In particular, it:
- does not compile unicodeobject, unicodectype, 
_codecsmodule, and unicodedata if Unicode is disabled
- checks for Py_Unicode in all places that use 
Unicode functions
- disables unicode literals, the builtin functions, 
and the string encode and decode methods,
- avoids Unicode literals in a few places in the 
libraries
- adds the types.StringTypes list

Most of the test suite passes with these changes. A 
number of tests fail, mostly because they use Unicode 
literals.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2001-08-17 11:44

Message:
Logged In: YES 
user_id=21627

Committed as

Makefile.pre.in:1.53
configure:1.240
configure.in:1.248
setup.py:1.50
Include/intobject.h:2.22
Include/longobject.h:2.21
Include/object.h:2.86
Include/unicodeobject.h:2.31
Lib/ConfigParser.py:1.36
Lib/copy.py:1.20
Lib/site.py:1.35
Lib/types.py:1.20
Lib/test/pickletester.py:1.7
Lib/test/string_tests.py:1.10
Lib/test/test_b1.py:1.38
Lib/test/test_contains.py:1.8
Lib/test/test_format.py:1.12
Lib/test/test_iter.py:1.18
Lib/test/test_pprint.py:1.5
Lib/test/test_sre.py:1.27
Lib/test/test_support.py:1.25
Lib/test/test_winreg.py:1.10
Misc/NEWS:1.207
Modules/_codecsmodule.c:2.9
Modules/_sre.c:2.63
Modules/_tkinter.c:1.119
Modules/cPickle.c:2.62
Modules/pyexpat.c:2.48
Objects/abstract.c:2.72
Objects/complexobject.c:2.39
Objects/floatobject.c:2.86
Objects/intobject.c:2.62
Objects/longobject.c:1.92
Objects/object.c:2.139
Objects/stringobject.c:2.124
Python/bltinmodule.c:2.227
Python/compile.c:2.218
Python/getargs.c:2.62
Python/marshal.c:1.65
Python/modsupport.c:2.58
Python/pythonrun.c:2.147
Python/sysmodule.c:2.92

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-08-03 00:45

Message:
Logged In: YES 
user_id=21627

I've added an additional test.patch file, which only 
records the changes to Lib/test. With this patch, I get 
the following failures:
test_grammar test___all__ test_charmapcodec test_codecs 
test_gettext test_minidom test_pyexpat test_sax 
test_string test_ucn test_unicode test_unicodedata 
test_urllib test_zipfile1

I don't think this list cannot be reduced much further 
without seriously impacting the strength of the test suite.

To reduce the number of failures to this list, I also had 
to modify pickle.py to not use Unicode literals anymore. 
I'm not sure whether this is a good idea, as it impacts 
performance; the pickle.patch is attached separately.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-02 09:29

Message:
Logged In: YES 
user_id=38388

Uploaded a revised patch. The test suite still fails -- it
would be nice if you could work this out; I don't want to
check the patch in before the test suite runs through
without failures.

Thanks.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-08-01 23:09

Message:
Logged In: YES 
user_id=21627

Updated patch after merger with descr_branch.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-07-31 00:56

Message:
Logged In: YES 
user_id=21627

Replaced patch, since it contained unrelated fragments.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-07-31 00:48

Message:
Logged In: YES 
user_id=21627

The new version of the patch implements all features that 
have been discussed.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-30 07:39

Message:
Logged In: YES 
user_id=38388

Ok, I see your point about the API references.

About the PyString_Encode/Decode: on platforms without Unicode, the encoding should not have a default, so 
passing NULL as encoding should result in an error. I am not even sure, whether it should have a default on 
Unicode builds... probably not.

Trimming down the _codecmodule.c to register and lookup is OK; there are a few codecs in 2.2 which don't
use Unicode at all.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-07-30 07:30

Message:
Logged In: YES 
user_id=21627

This patch already makes use of the assumption that
PyUnicode_Check will always return 0. In all the remaining
cases, the code will also call some function of the Unicode
module, which will result in a compile time error since the
functions are not declared anymore. Even if it was declared,
it would probably result in a linker error since not all
compilers will remove the entire code block. Only in cases
where the if-block does not call any Unicode functions
directly, that approach can be used.

I can try to re-enable the _codecs module, although only
register and lookup would remain.

I cannot re-enable PyString_Decode/Encode, since they use 
PyUnicode_GetDefaultEncoding, which is not available since
unicodeobject.c is not compiled.

I will try to have the tokenizer generate more specific
error messages.

Support for "es", "et" is still there; they only work for
strings, though, and they never call any codecs.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-07-30 06:06

Message:
Logged In: YES 
user_id=38388

Nice work, Martin !

Some comments:
- I think that we could save some of the #ifdefs by simply assuming that an optimizing will not generate code for "if 
(0)" == "if (PyUnicode_Check(obj))"; this would make the code more readable
- the _codecmodule.c should not be disabled by the configure option... codecs are useful for non-Unicode 
applications as well
- the PyString_Encode/Decode() APIs should not be disabled for the same reason
- the tokenizer/compiler should generate errors with an explicit message stating that the Python version was 
compiled without Unicode support
- dito for the Unicode parser markers (I think that open() on Windows will fail without "es"... ?)

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=445762&group_id=5470