Re: How to detect text file encoding in Perl
- From: "Alan J. Flavell" <flavell@xxxxxxxxxxxxxxxxx>
- Date: Sun, 21 May 2006 10:12:07 +0100
On Sun, 21 May 2006, corff@xxxxxxxxxxxxxxxxxx wrote:
Google is probably your friend. If not: <B>yte <O>rder <M>ark.
http://www.unicode.org/faq/utf_bom.html#BOM
store your data as UTF-8, or your data _is_ UTF-8, you'll see that after
storing the bytecount is two bytes more because the byte 0xff 0xef get
prepended automatically,
The BOM is the relevant encoding of the Unicode character U+FEFF. No
way is it 0xff 0xef. The various encoded byte patterns are shown in
that Unicode FAQ, and in utf-8 it's *three* bytes.
in order to tell the software which byte order is to be expected.
"No, a BOM can be used as a signature no matter how the Unicode text
is transformed"
This makes sense with UCS-2 Unicode (the "original" Unicode
encoding)
Yes, but "UCS-2" is out of date:
http://www.unicode.org/faq/basic_q.html#23
The utf-16 encoding form is its present counterpart.
but not with UTF-8 (8-bit transformation format of Unicode) because
the characters encoded in UTF-8 are self-synchronizing and no
information about byte order is needed.
Nevertheless, the Unicode FAQ points out that utf-8 can usefully
start with a BOM as an encoding signature.
In contrast, other programs behaving correctly frequently complain
if the BOM appears where it simply doesn't belong.
Except that it is not inherently incorrect for it to appear at the
beginning of a utf-8 stream - but see the cited FAQ for details.
Seems to me you would have done well to read that FAQ yourself, before
putting misleading opinions on the record.
regards
--
Beware of negative easements.
.
- Follow-Ups:
- Re: How to detect text file encoding in Perl
- From: corff
- Re: How to detect text file encoding in Perl
- References:
- How to detect text file encoding in Perl
- From: chaojen . chen
- Re: How to detect text file encoding in Perl
- From: Brian McCauley
- Re: How to detect text file encoding in Perl
- From: chaojen . chen
- Re: How to detect text file encoding in Perl
- From: corff
- How to detect text file encoding in Perl
- Prev by Date: Re: How to match characters in different locations within string
- Next by Date: Simple methods when using fields
- Previous by thread: Re: How to detect text file encoding in Perl
- Next by thread: Re: How to detect text file encoding in Perl
- Index(es):
Relevant Pages
|
|