Re: Try this
- From: John Machin <sjmachin@xxxxxxxxxxx>
- Date: Sun, 16 Sep 2007 16:21:53 -0700
On Sep 17, 8:53 am, "mensana...@xxxxxxx" <mensana...@xxxxxxx> wrote:
On Sep 16, 5:28?pm, John Machin <sjmac...@xxxxxxxxxxx> wrote:
On Sep 17, 7:54 am, "mensana...@xxxxxxx" <mensana...@xxxxxxx> wrote:
On Sep 16, 2:22?pm, Steve Holden <st...@xxxxxxxxxxxxx> wrote:
mensana...@xxxxxxx wrote:
On Sep 16, 1:10?pm, Dennis Lee Bieber <wlfr...@xxxxxxxxxxxxx> wrote:
On Sun, 16 Sep 2007 01:46:34 -0700, GeorgeRXZ <george...@xxxxxxxxx>
declaimed the following in comp.lang.python:
Then Open the Notepad and type the following sentence, and save theI tried. I also opened the saved file in SciTE...
file and close the notepad. Now reopen the file and you will find out
that, Notepad is not able to save the following text line.
Well you are speed
This occurs not only with above sentence but any sentence that has
4 3 3 5 (sequence of characters: Well=4 you=3 are=3 speed=5)
And the text WAS there...
It is Notepad that can not properly render what it,
itself, saved.
C:\Documents and Settings\mensanator\My Documents>type huh.txt
Well you are speed
Yes, file was saved correctly.
But reopening it shows 9 unprintable characters.
If I copy those to a new file (huh1.txt):
C:\Documents and Settings\mensanator\My Documents>type huh1.txt
?????????
But wait...the new file is 20 characters, not 9.
09/16/2007 01:44 PM 18 huh.txt
09/16/2007 01:54 PM 20 huh1.txt
C:\Documents and Settings\mensanator\My Documents>dump huh.txt
huh.txt:
00000000 5765 6c6c 2079 6f75 2061 7265 2073 7065 Well you are spe
00000010 6564 ed
Here's what it's actually doing:
C:\Documents and Settings\mensanator\My Documents>dump huh1.txt
huh1.txt:
00000000 fffe 5765 6c6c 2079 6f75 2061 7265 2073 .~Well you are s
00000010 7065 6564 peed
One word: Unicode.
The "open" and "save" dialogs allow you to specify an encoding.
And the encoding specified was ANSI.
If you
specify Unicode the you will get what you see above.
And if you specify ANSI _before_ you click the file name,
the specification switches to Unicode and has to then
be manually switched back to ANSI.
If you specify ANSI
you will get the text you entered.
It's still a bug in the "open" dialog.
It's more like a bug/feature in its encoding detector.
It is NOT a feature. If I save something as ANSI,
there is no excuse for it not to re-open in ANSI.
It doesn't know that you or anybody else saved it as "ANSI". All it is
seeing is a string of bytes.
If you are silly enough to type in

[that's "\xef\xbb\xbf" repeated a few times]
and save it as "ANSI", it has every excuse to open it as something
else :-)
I can get it to
switch to Unicode only if there's an even number of characters AND the
line is NOT terminated by CRLF -- add/remove one alpha character, or
hit the enter key at the end of the line, and it won't detect it as
Unicode when you open it again.
You only get the BOM (0xfffe) if you are silly enough to save it while
it's open in Unicode mode.
That was a test. I wasn't so stupid as to save
to the original file, but to make a copy.
By the way, this has precisely what to do with Python?
I've been known to use Notepad to create Python
source code.
Your source code would have to be trivially short to trigger the
strange behaviour.
Makes you wonder what other edge cases aren't
handled properly.
Makes you wonder why Microsoft doesn't employ
professional programmers.
I'm eagerly awaiting publication of your professional specification
for correctly detecting the encoding of an arbitrary stream of
bytes :-)
.
- Follow-Ups:
- Re: Try this
- From: mensanator@xxxxxxx
- Re: Try this
- References:
- Try this
- From: GeorgeRXZ
- Re: Try this
- From: mensanator@xxxxxxx
- Re: Try this
- From: Steve Holden
- Re: Try this
- From: mensanator@xxxxxxx
- Re: Try this
- From: John Machin
- Re: Try this
- From: mensanator@xxxxxxx
- Try this
- Prev by Date: adding a static class to another class
- Next by Date: Re: Needless copying in iterations?
- Previous by thread: Re: Try this
- Next by thread: Re: Try this
- Index(es):
Relevant Pages
|