Re: Library function to detect UTF-8 streams without BOM
- From: "Remy Lebeau \(TeamB\)" <no.spam@xxxxxxxxxxx>
- Date: Fri, 14 Dec 2007 11:20:10 -0800
"marek jedlinski" <marekjed@xxxxxxxxxxxxxxxxx> wrote in message
news:ths4m3t6ljucpt12ne9638k5sc2bvtq6sp@xxxxxxxxxx
I've been testing several Unicode-capable shareware editors for
Windows (can't find one that's quite right for my work), and none
has any problems detecting BOM-less UTF-8, even in non-xml/html
files, where they cannot rely on the encoding specified in the file
itself.
Unless the encoding is specified by a BOM or explicitally (and accurately)
inside the content, then it has to be determined by analyzing the format of
the content and making guesses about what encoding might be used. There are
some blogs about this in MSDN that describe how Notepad tries to auto-detect
the encoding, for instance:
Some files come up strange in Notepad
http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx
The Notepad file encoding problem, redux
http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspx
Gambit
.
- References:
- Library function to detect UTF-8 streams without BOM
- From: marek jedlinski
- Re: Library function to detect UTF-8 streams without BOM
- From: Nils Haeck
- Re: Library function to detect UTF-8 streams without BOM
- From: marek jedlinski
- Library function to detect UTF-8 streams without BOM
- Prev by Date: Re: Library function to detect UTF-8 streams without BOM
- Next by Date: Re: ANN: RealThinClient SDK > FREE under a BSD style license
- Previous by thread: Re: Library function to detect UTF-8 streams without BOM
- Next by thread: Re: Library function to detect UTF-8 streams without BOM
- Index(es):
Relevant Pages
|