Re: Try this



On Sep 16, 6:21?pm, John Machin <sjmac...@xxxxxxxxxxx> wrote:
On Sep 17, 8:53 am, "mensana...@xxxxxxx" <mensana...@xxxxxxx> wrote:





On Sep 16, 5:28?pm, John Machin <sjmac...@xxxxxxxxxxx> wrote:

On Sep 17, 7:54 am, "mensana...@xxxxxxx" <mensana...@xxxxxxx> wrote:

On Sep 16, 2:22?pm, Steve Holden <st...@xxxxxxxxxxxxx> wrote:

mensana...@xxxxxxx wrote:
On Sep 16, 1:10?pm, Dennis Lee Bieber <wlfr...@xxxxxxxxxxxxx> wrote:
On Sun, 16 Sep 2007 01:46:34 -0700, GeorgeRXZ <george...@xxxxxxxxx>
declaimed the following in comp.lang.python:

Then Open the Notepad and type the following sentence, and save the
file and close the notepad. Now reopen the file and you will find out
that, Notepad is not able to save the following text line.
Well you are speed
This occurs not only with above sentence but any sentence that has
4 3 3 5 (sequence of characters: Well=4 you=3 are=3 speed=5)
I tried. I also opened the saved file in SciTE...
And the text WAS there...

It is Notepad that can not properly render what it,
itself, saved.

C:\Documents and Settings\mensanator\My Documents>type huh.txt
Well you are speed

Yes, file was saved correctly.
But reopening it shows 9 unprintable characters.
If I copy those to a new file (huh1.txt):

C:\Documents and Settings\mensanator\My Documents>type huh1.txt
?????????

But wait...the new file is 20 characters, not 9.

09/16/2007 01:44 PM 18 huh.txt
09/16/2007 01:54 PM 20 huh1.txt

C:\Documents and Settings\mensanator\My Documents>dump huh.txt
huh.txt:
00000000 5765 6c6c 2079 6f75 2061 7265 2073 7065 Well you are spe
00000010 6564 ed

Here's what it's actually doing:

C:\Documents and Settings\mensanator\My Documents>dump huh1.txt
huh1.txt:
00000000 fffe 5765 6c6c 2079 6f75 2061 7265 2073 .~Well you are s
00000010 7065 6564 peed

One word: Unicode.

The "open" and "save" dialogs allow you to specify an encoding.

And the encoding specified was ANSI.

If you
specify Unicode the you will get what you see above.

And if you specify ANSI _before_ you click the file name,
the specification switches to Unicode and has to then
be manually switched back to ANSI.

If you specify ANSI
you will get the text you entered.

It's still a bug in the "open" dialog.

It's more like a bug/feature in its encoding detector.

It is NOT a feature. If I save something as ANSI,
there is no excuse for it not to re-open in ANSI.

It doesn't know that you or anybody else saved it as "ANSI". All it is
seeing is a string of bytes.

If you are silly enough to type in
[that's "\xef\xbb\xbf" repeated a few times]
and save it as "ANSI", it has every excuse to open it as something
else :-)


Did you notice that those three bytes all have bit 7 set?

So they are not ASCII.

There is no excuse to treat a string of ASCII codes as
anything other than ASCII without specific direction
from the user.


I can get it to
switch to Unicode only if there's an even number of characters AND the
line is NOT terminated by CRLF -- add/remove one alpha character, or
hit the enter key at the end of the line, and it won't detect it as
Unicode when you open it again.

You only get the BOM (0xfffe) if you are silly enough to save it while
it's open in Unicode mode.

That was a test. I wasn't so stupid as to save
to the original file, but to make a copy.

By the way, this has precisely what to do with Python?

I've been known to use Notepad to create Python
source code.

Your source code would have to be trivially short to trigger the
strange behaviour.

Makes you wonder what other edge cases aren't
handled properly.

Makes you wonder why Microsoft doesn't employ
professional programmers.

I'm eagerly awaiting publication of your professional specification
for correctly detecting the encoding of an arbitrary stream of
bytes

The very presence of an algorithm to detect encoding is a bug.
Files with they .txt extension should always be treated as ANSI
even if they contain binary data. Notepad should never be
allowed to try to decide what the encoding is if the the open
dialog has the encoding set to ANSI.

.



Relevant Pages

  • Re: Convert DOS Cyrillic text to Unicode
    ... that a user paste DOS Cyrillic text (taken from Notepad) ... Strings in .NET are always Unicode! ... I've only used the normal encoding for requests & response in ASP.NET, ... Notice that in the above there is a whole lot of converting going on! ...
    (microsoft.public.dotnet.languages.vb)
  • Re: convert from utf-8 to unicode(excel)
    ... Is there a possibility to properly convert under Windows from utf-8 ... encoding to unicode ... There is no problem in conversion when I do it in Notepad. ... a file marking encoding as UTF-8 and then save it marking encoding as ...
    (comp.editors)
  • Re: Try this
    ... Files with they .txt extension should always be treated as ANSI ... Because that's the absence of encoding? ... Notepad supported Unicode even before the BOM was invented. ... that the text is some Estern language encoded in UTF16. ...
    (comp.lang.python)
  • Re: Try this
    ... Notepad is not able to save the following text line. ... But reopening it shows 9 unprintable characters. ... specify Unicode the you will get what you see above. ... If I save something as ANSI, ...
    (comp.lang.python)
  • Re: TCHAR string?
    ... According to Microsoft's documentation the 'A' functions are "ANSI" ... although Unicode is not itself an ISO standard; ... just as much an ISO encoding as any of the ISO encodings ... Windows) *was* to be able to represent any of the characters of the ...
    (microsoft.public.vc.mfc)