Re: Judge the encode systm used by the file.
- From: richard@xxxxxxxxxxxxxxx (Richard Tobin)
- Date: 29 Oct 2008 16:25:28 GMT
In article <K9ICop.LFE@xxxxxx>, Dik T. Winter <Dik.Winter@xxxxxx> wrote:
I think this is *very* rare. An English language file that uses a few
accented characters from 8859-something will not be legal UTF-8,
because in UTF-8 characters above 127 always come in groups of at
least two.
Who is talking about English language files?
See my other response.
I would be interested to see a real-life 8859 file that's also legal
UTF-8.
Start the other way. Every UTF-8 file is also a correct 8859 file. But if
you want to omit the higher control characters than every UTF-8 file that
does not contain a byte in the range 8000-801F is a correct 8859 file.
I know that. But in real life, the chances of a file being 8859
if it is legal as UTF-8 is negligible. You can distinguish the
two with high reliability by testing for legality as UTF-8.
-- Richard
--
Please remember to mention me / in tapes you leave behind.
.
- Follow-Ups:
- Re: Judge the encode systm used by the file.
- From: Dik T. Winter
- Re: Judge the encode systm used by the file.
- References:
- Judge the encode systm used by the file.
- From: Hongyi Zhao
- Re: Judge the encode systm used by the file.
- From: Richard Bos
- Re: Judge the encode systm used by the file.
- From: Richard Tobin
- Re: Judge the encode systm used by the file.
- From: Dik T. Winter
- Judge the encode systm used by the file.
- Prev by Date: Re: Sequence Point before actual function call
- Next by Date: Refresher questions in C/C++
- Previous by thread: Re: Judge the encode systm used by the file.
- Next by thread: Re: Judge the encode systm used by the file.
- Index(es):
Relevant Pages
|