Using more than 7 bit ASCII on windows.

Mark Hammond MarkH at ActiveState.com
Mon Oct 30 09:05:25 EST 2000


Paul Moore wrote:

> I can accept this. But I still don't know how to enter a literal
> string containing a "£" character into Python. More explicitly, Python
> accepts the line
> 
>     >>> s = "£"
> 
> But what does this *mean* (ie, what should I expect the semantics of s
> to be?)

This means that you have a Python string, containing a single character 
with ASCII value > 127.  It so happens that given your current font and 
code page, this shows up as a British pound symbol.

It is _not_ the canonical representation of a British pound symbol - it 
just happens to be that symbol in your code page.

> No, I can be convinced. However, what I don't see is how I should
> write a literal string which os.chdir will recognise as being composed
> of the characters '1', '0' and '£' (ie, in whatever form necessary to
> cause Python to change directory to the directory I created at NT's
> command line with "mkdir 10£" in my Latin-1 setup.)

*sigh* Pythonwin is still mis-behaving here :-(  Well, in this case it 
is actually _behaving_.  The following works for me from Pythonwin, but 
not a dos box.

 >>> os.getcwd()
'F:\\'
 >>> d="\\test£"
 >>> os.chdir(d)
 >>> os.getcwd()
'F:\\test\243'
 >>> print os.getcwd()
F:\test£
 >>>

In the dos box, I can't make it work :-(

> BTW, if Python has chosen not to assign a meaning to characters above
> 127 (my interpretation of your comment "Python has chosen not to have
> a default character set"), does that not imply that string literals
> containing characters >127 should raise an exception? 

When they need to be interpreted as a "character" and no encoding is 
available, this is exactly what happens (the dreaded "UnicodeError: 
ASCII encoding error: ordinal not in range(128)" exception).

Unfortunately, Python's historic use of string objects for binary data 
(ie, as returned by file.read(), for example) means that rule can not be 
made for all strings everywhere.

Python can not define a meaning for string characters > 127, as they 
only have one in the context of a code page (or encoding).  The 
operating system may have a default encoding, and it is arguable that 
Python should use this as the default (although also good arguments 
against it).  Either way, Python will still not itself have special 
meaning for bytes in string objects in this range.

> Fair. So the question remains, how do I chdir to a directory named
> "10£"?

No idea.

> (My issue is different from the display issue which started the thread
> - sorry, it looks like my misunderstanding has muddied the waters by
> linking 2 unrelated issues...)

And my complete misunderstanding of _all_ the issues isn't helping 
either ;-)  So that said, I would wait a few days for corrections to 
this post to appear before I believed any of it!

Mark.




More information about the Python-list mailing list