Re: Defacto standard string library
- From: Stephen Sprunk <stephen@xxxxxxxxxx>
- Date: Mon, 05 Jan 2009 11:29:54 -0600
Richard Tobin wrote:
In article <871vvi11vi.fsf@xxxxxxxxxxxxxxxxxxxx>,
Phil Carmody <thefatphil_demunged@xxxxxxxxxxx> wrote:
Technically, you can't throw them away even if you do know the file is
UTF-8 (or UTF-16), because it's possible that there was no BOM and the
user content actually started with a ZWNBSP...
Possible, but recommended against:
http://unicode.org/faq/utf_bom.html#bom7
<<<
Q: I am using a protocol that has BOM at the start of text. How do I
represent an initial ZWNBSP?
A: Use U+2060 WORD JOINER instead. [MD]
Or use the sequence twice: the first will be interpreted as a BOM, the
second as a ZWNBSP. But why on earth would you want it anyway?
It is extremely unlikely, which is one of the reasons the ZWNBSP was chosen as the BOM. The particular code point for the ZWNBSP (0xFEFF) was chosen, IIRC, because the UTF-16LE and UTF-16BE encodings of it were invalid UTF-8, thus distinguishing exactly which of the three UTFs was in use -- but it can't definitively tell you that it's not some other encoding.
S
.
- References:
- Re: Defacto standard string library
- From: Phil Carmody
- Re: Defacto standard string library
- From: Stephen Sprunk
- Re: Defacto standard string library
- From: Phil Carmody
- Re: Defacto standard string library
- From: Richard Tobin
- Re: Defacto standard string library
- Prev by Date: Re: Compound literals efficiency
- Next by Date: Re: Defacto standard string library
- Previous by thread: Re: Defacto standard string library
- Next by thread: Re: Defacto standard string library
- Index(es):
Relevant Pages
|