char 128? no... 256

Roman Suzi rnd at onego.ru
Wed Feb 12 12:18:02 EST 2003


On Wed, 12 Feb 2003, Afanasiy wrote:
>On Wed, 12 Feb 2003 15:50:53 GMT, Afanasiy <abelikov72 at hotmail.com> wrote:
>>On Wed, 12 Feb 2003 03:18:43 GMT, Afanasiy <abelikov72 at hotmail.com> wrote:

>>Now, even encoding the 'latin-1', 8 bit, is problematic, because symbols
>>which are 8 bit in Windows, such as the TradeMark symbol will not encode
>>into 8 bit, as the ordinal value in the Unicode object is 8482.
>>
>>This is hex 99 on a plain Windows 2000 install, I presume 'latin-1'.

That is why your Windows doesn't use latin-1.

$ grep -i trade /usr/local/lib/python2.3/encodings/*.py
cp1250.py:        0x0099: 0x2122, # TRADE MARK SIGN
cp1251.py:        0x0099: 0x2122, # TRADE MARK SIGN
cp1252.py:        0x0099: 0x2122, # TRADE MARK SIGN
cp1253.py:        0x0099: 0x2122, # TRADE MARK SIGN
cp1254.py:        0x0099: 0x2122, # TRADE MARK SIGN
cp1255.py:        0x0099: 0x2122, # TRADE MARK SIGN
cp1256.py:        0x0099: 0x2122, # TRADE MARK SIGN
cp1257.py:        0x0099: 0x2122, # TRADE MARK SIGN
cp1258.py:        0x0099: 0x2122, # TRADE MARK SIGN
mac_cyrillic.py:        0x00aa: 0x2122, # TRADE MARK SIGN
mac_greek.py:        0x0093: 0x2122, # TRADE MARK SIGN
mac_iceland.py:        0x00aa: 0x2122, # TRADE MARK SIGN
mac_latin2.py:        0x00aa: 0x2122, # TRADE MARK SIGN
mac_roman.py:        0x00aa: 0x2122, # TRADE MARK SIGN
mac_turkish.py:        0x00aa: 0x2122, # TRADE MARK SIGN
palmos.py:        0x0099: 0x2122, #       TRADE MARK SIGN

So, you need to convert to one of these instead of latin-1.

(Hmmm... I thought cp1250 is latin1.)

Aliases of latin-1:
    '8859'               : 'latin_1',
    'cp819'              : 'latin_1',
    'csisolatin1'        : 'latin_1',
    'ibm819'             : 'latin_1',
    'iso8859'            : 'latin_1',
    'iso_8859_1'         : 'latin_1',
    'iso_8859_1_1987'    : 'latin_1',
    'iso_ir_100'         : 'latin_1',
    'l1'                 : 'latin_1',
    'latin'              : 'latin_1',
    'latin1'             : 'latin_1',


>>(Which is iso-8859-1 afaik) This will show up in webpages designated :
>>
>><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
>>
>>This will show up in notepad... and in my non-unicode text editors.
>>
>>It always shows up as the TradeMark symbol.
>>
>>So how would I encode this Unicode character, 8482 so that it would
>>show up as a TradeMark symbol on Windows 2000 machines. Windows 2000
>>can display a TradeMark symbol in non Unicode applications.
>
>To clarify, the TradeMark symbol is being transformed to Unicode #8482
>automatically, presumably by COM or ADO. In Python, I do not know how
>I am supposed to be able to print (for example) the Unicode object I
>receive which contains this transformed TradeMark symbol.

s = u"Your Unicode string\u2122"
s = s.replace(u"\u2122", u"(tm)")
print s.encode("latin-1")

Or, most probably:

s = u"Bill Gates Makes Your Life Interesting\u2122"
print s.encode("cp1250")


Sincerely yours, Roman Suzi

P.S. All Trademarks belong to their respective owners. ;-)
-- 
rnd at onego.ru =\= My AI powered by Linux RedHat 7.3






More information about the Python-list mailing list