[I18n-sig] Test Suite for the Unicode codecs

M.-A. Lemburg mal@lemburg.com
Sat, 01 Apr 2000 00:15:53 +0200


I would like to add some more testing to the mapping codecs
in the Python encodings package. Right now I can only test
for round-trips of lower character ordinal ranges and even
those tests fail for a couple of encodings.

Does anyone have access to some reference test suite for
these mappings ? The mapping codec is probably not the
cause for these errors. Perhaps the maps themselves
aren't of high enough quality or maybe some mappings
just cannot provide round-trip safety...

Here are my findings in form of a Python test script with comments.
The tests first translate an encoded into Unicode and then
translate it back. Some have undefined mappings even in the
lower ranges and others seem to be 1-n rather than 1-1.

print 'Testing standard mapping codecs...',

print '0-127...',
s = ''.join(map(chr, range(128)))
for encoding in (
    'cp037', 'cp1026',
    'cp437', 'cp500', 'cp737', 'cp775', 'cp850',
    'cp852', 'cp855', 'cp860', 'cp861', 'cp862',
    'cp863', 'cp865', 'cp866', 
    'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15',
    'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6',
    'iso8859_7', 'iso8859_9', 'koi8_r', 'latin_1',
    'mac_cyrillic', 'mac_latin2',

    'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255',
    'cp1256', 'cp1257', 'cp1258',
    'cp856', 'cp857', 'cp864', 'cp869', 'cp874',

    'mac_greek', 'mac_iceland','mac_roman', 'mac_turkish',
    'cp1006', 'cp875', 'iso8859_8',
    
    ### These have undefined mappings:
    #'cp424',
    
    ):
    try:
        assert unicode(s,encoding).encode(encoding) == s
    except AssertionError:
        print '*** codec "%s" failed round-trip' % encoding
    except ValueError,why:
        print '*** codec for "%s" failed: %s' % (encoding, why)

print '128-255...',
s = ''.join(map(chr, range(128,256)))
for encoding in (
    'cp037', 'cp1026',
    'cp437', 'cp500', 'cp737', 'cp775', 'cp850',
    'cp852', 'cp855', 'cp860', 'cp861', 'cp862',
    'cp863', 'cp865', 'cp866', 
    'iso8859_10', 'iso8859_13', 'iso8859_14', 'iso8859_15',
    'iso8859_2', 'iso8859_3', 'iso8859_4', 'iso8859_5', 'iso8859_6',
    'iso8859_7', 'iso8859_9', 'koi8_r', 'latin_1',
    'mac_cyrillic', 'mac_latin2',
    
    ### These have undefined mappings:
    #'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255',
    #'cp1256', 'cp1257', 'cp1258',
    #'cp424', 'cp856', 'cp857', 'cp864', 'cp869', 'cp874',
    #'mac_greek', 'mac_iceland','mac_roman', 'mac_turkish',
    
    ### These fail the round-trip:
    #'cp1006', 'cp875', 'iso8859_8',
    
    ):
    try:
        assert unicode(s,encoding).encode(encoding) == s
    except AssertionError:
        print '*** codec "%s" failed round-trip' % encoding
    except ValueError,why:
        print '*** codec for "%s" failed: %s' % (encoding, why)

print 'done.'

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/