[Patches] [ python-Patches-1572832 ] Fix for segfault in ISO 2022 codecs

SourceForge.net noreply at sourceforge.net
Sat Oct 7 20:00:20 CEST 2006


Patches item #1572832, was opened at 2006-10-07 14:00
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1572832&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Ray Chason (chasonr)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for segfault in ISO 2022 codecs

Initial Comment:
This may relate to bug report 1005078, which was closed
because it couldn't be duplicated with the information
given.

Run the following program for a segmentation fault on
your Python interpreter:

--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT
HERE--CUT HERE--
import sys

for x in xrange(0x10000, 0x110000):
    if sys.maxunicode >= 0x10000:
        ch = unichr(x)
    else:
        ch = unichr(0xD7C0+(x>>10)) + unichr(0xDC00+(x
& 0x3FF))
    try:
        # Any ISO 2022 codec will cause the segfault
        ch.encode("iso_2022_jp")
    except UnicodeEncodeError:
        pass
--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT
HERE--CUT HERE--

I have verified this bug on four different Pythons:

* The current ActivePython (2.4.3 based), running on
Windows XP SP2
* The stock Python 2.4.2 on Ubuntu Breezy (i386)
* The stock Python 2.4.2 on Ubuntu Breezy (AMD64)
* A home-built Python 2.5 on Ubuntu Breezy (i386);
--enable-unicode=ucs4 is selected and other options are
left at default

It does not just affect iso_2022_jp, but all of the ISO
2002 codecs.

If you are attempting to replicate the bug on Linux,
you may get more repeatble results if you first go root
and then:

    echo 0 > /proc/sys/kernel/randomize_va_space

This seems related to bug report 1005078.  However, bug
report 1005078 claimed that a character in the BMP
could cause a crash.  I have not reproduced that bug
using a BMP character; however, supplementary
characters can in fact cause the ISO 2022 codecs to crash.

The problem is that four functions in
Modules/cjkcodecs/_codecs_iso2022.c do not check that
the code point is less than 0x10000 before invoking the
TRYMAP_ENC macro.  This causes the bounds of the
encoding table to be exceeded.   The four functions are:

* ksx1001_encoder
* jisx0208_encoder
* jisx0212_encoder
* gb2312_encoder

The enclosed patch adds the necessary checks, and the
above program then completes without incident.  It is
derived from the official 2.5 release, but also applies
cleanly against the daily drop of 6 October 2006
because the file Modules/cjkcodecs/_codecs_iso2022.c is
unchanged in that drop.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1572832&group_id=5470


More information about the Patches mailing list