[ python-Bugs-1313051 ] mac_roman codec missing "apple" codepoint

SourceForge.net noreply at sourceforge.net
Tue Oct 4 23:48:09 CEST 2005


Bugs item #1313051, was opened at 2005-10-04 18:37
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1313051&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Nelson (tony_nelson)
Assigned to: Nobody/Anonymous (nobody)
Summary: mac_roman codec missing "apple" codepoint

Initial Comment:
The mac_roman codec is missing a single codepoint for
the trademarked Apple logo (0xF0 <=> 0xF8FF per Apple
docs), which prevents round-tripping of mac_roman text
through Unicode.  Adding the codepoint as a private
encoding (per Apple) has no trademark implications,
only the character itself, in a font, would have such
issues.

I'm using Python 2.3, but AFAICT it is an issue in
later Python versions as well.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2005-10-04 23:48

Message:
Logged In: YES 
user_id=38388

Tony, comment like yours are not very helpful.

Python's codecs rely on facts defined by standards bodies,
e.g. the Unicode consortium, ISO, etc.. If you don't present
proof of your claim then there's nothing much we can do
about your particular problem.

Fortunately, proof isn't hard to find in this case:

http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT

Looks like Apple added the mapping sometime after the codec
was created.

Walter: it is common for companies to add their logos as
private Unicode characters. This happens a lot in the Asian
world. Of course, interop isn't great, but at least you
don't lose information by converting to Unicode.

Tony: Python is not damaging your data - the codec will
raise an exception in case that particular character is
converted to Unicode.

Please recreate the codec using gencodec.py (which you can
find the Tools/ directory) and add it as attachement to this
bug report. Thanks.


----------------------------------------------------------------------

Comment By: Tony Nelson (tony_nelson)
Date: 2005-10-04 22:41

Message:
Logged In: YES 
user_id=1356214

It isn't Python's job to tell people what characters they
are allowed to use.  Apple defined the codepoint and its
mapping to Unicode.  Python is not the Unicode Police, and
should not damage the data it was given just to prove a
point.  Damaging the user's data isn't very "batteries
included".

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2005-10-04 20:07

Message:
Logged In: YES 
user_id=89016

The codepoint 0xF8FF is in the Private Use Area, so this is
not an official Unicode character, and for other uses 0xF8FF
might mean something completely different. So I think this
mapping shouldn't be added to mac_roman.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1313051&group_id=5470


More information about the Python-bugs-list mailing list