Try this

mensanator at aol.com mensanator at aol.com
Sun Sep 16 20:58:09 EDT 2007


On Sep 16, 6:21?pm, John Machin <sjmac... at lexicon.net> wrote:
> On Sep 17, 8:53 am, "mensana... at aol.com" <mensana... at aol.com> wrote:
>
>
>
>
>
> > On Sep 16, 5:28?pm, John Machin <sjmac... at lexicon.net> wrote:
>
> > > On Sep 17, 7:54 am, "mensana... at aol.com" <mensana... at aol.com> wrote:
>
> > > > On Sep 16, 2:22?pm, Steve Holden <st... at holdenweb.com> wrote:
>
> > > > > mensana... at aol.com wrote:
> > > > > > On Sep 16, 1:10?pm, Dennis Lee Bieber <wlfr... at ix.netcom.com> wrote:
> > > > > >> On Sun, 16 Sep 2007 01:46:34 -0700, GeorgeRXZ <george... at gmail.com>
> > > > > >> declaimed the following in comp.lang.python:
>
> > > > > >>> Then Open the Notepad and type the following sentence, and save the
> > > > > >>> file and close the notepad. Now reopen the file and you will find out
> > > > > >>> that, Notepad is not able to save the following text line.
> > > > > >>> Well you are speed
> > > > > >>> This occurs not only with above sentence but any sentence that has
> > > > > >>> 4 3 3 5 (sequence of characters: Well=4 you=3 are=3 speed=5)
> > > > > >>         I tried. I also opened the saved file in SciTE...
> > > > > >> And the text WAS there...
>
> > > > > >>         It is Notepad that can not properly render what it,
> > > > > >> itself, saved.
>
> > > > > > C:\Documents and Settings\mensanator\My Documents>type huh.txt
> > > > > > Well you are speed
>
> > > > > > Yes, file was saved correctly.
> > > > > > But reopening it shows 9 unprintable characters.
> > > > > > If I copy those to a new file (huh1.txt):
>
> > > > > > C:\Documents and Settings\mensanator\My Documents>type huh1.txt
> > > > > > ?????????
>
> > > > > > But wait...the new file is 20 characters, not 9.
>
> > > > > > 09/16/2007  01:44 PM                18 huh.txt
> > > > > > 09/16/2007  01:54 PM                20 huh1.txt
>
> > > > > > C:\Documents and Settings\mensanator\My Documents>dump huh.txt
> > > > > > huh.txt:
> > > > > > 00000000  5765 6c6c 2079 6f75 2061 7265 2073 7065 Well you are spe
> > > > > > 00000010  6564                                    ed
>
> > > > > > Here's what it's actually doing:
>
> > > > > > C:\Documents and Settings\mensanator\My Documents>dump huh1.txt
> > > > > > huh1.txt:
> > > > > > 00000000  fffe 5765 6c6c 2079 6f75 2061 7265 2073 .~Well you are s
> > > > > > 00000010  7065 6564                               peed
>
> > > > > One word: Unicode.
>
> > > > > The "open" and "save" dialogs allow you to specify an encoding.
>
> > > > And the encoding specified was ANSI.
>
> > > > > If you
> > > > > specify Unicode the you will get what you see above.
>
> > > > And if you specify ANSI _before_ you click the file name,
> > > > the specification switches to Unicode and has to then
> > > > be manually switched back to ANSI.
>
> > > > > If you specify ANSI
> > > > > you will get the text you entered.
>
> > > > It's still a bug in the "open" dialog.
>
> > > It's more like a bug/feature in its encoding detector.
>
> > It is NOT a feature. If I save something as ANSI,
> > there is no excuse for it not to re-open in ANSI.
>
> It doesn't know that you or anybody else saved it as "ANSI". All it is
> seeing is a string of bytes.
>
> If you are silly enough to type in
> [that's "\xef\xbb\xbf" repeated a few times]
> and save it as "ANSI", it has every excuse to open it as something
> else :-)
>

Did you notice that those three bytes all have bit 7 set?

So they are not ASCII.

There is no excuse to treat a string of ASCII codes as
anything other than ASCII without specific direction
from the user.

>
> > > I can get it to
> > > switch to Unicode only if there's an even number of characters AND the
> > > line is NOT terminated by CRLF -- add/remove one alpha character, or
> > > hit the enter key at the end of the line, and it won't detect it as
> > > Unicode when you open it again.
>
> > > You only get the BOM (0xfffe) if you are silly enough to save it while
> > > it's open in Unicode mode.
>
> > That was a test. I wasn't so stupid as to save
> > to the original file, but to make a copy.
>
> > > > > By the way, this has precisely what to do with Python?
>
> > > > I've been known to use Notepad to create Python
> > > > source code.
>
> > > Your source code would have to be trivially short to trigger the
> > > strange behaviour.
>
> > Makes you wonder what other edge cases aren't
> > handled properly.
>
> > Makes you wonder why Microsoft doesn't employ
> > professional programmers.
>
> I'm eagerly awaiting publication of your professional specification
> for correctly detecting the encoding of an arbitrary stream of
> bytes

The very presence of an algorithm to detect encoding is a bug.
Files with they .txt extension should always be treated as ANSI
even if they contain binary data. Notepad should never be
allowed to try to decide what the encoding is if the the open
dialog has the encoding set to ANSI.




More information about the Python-list mailing list