Re: Judge the encode systm used by the file.



richard@xxxxxxxxxxxxxxx (Richard Tobin) wrote:

Richard Bos <rlb@xxxxxxxxxxxxxxxxxxxxxx> wrote:

Possibly, but are you willing to rely on this, given the thousands of
languages out there, most of them, _unlike_ English, written in a Latin
script which uses diacritics to a greater or smaller degree?

Yes. It's very unlikely that all the sequences of 8859 characters used
in such a document will be legal UTF-8.

The heuristic is: if the file contains bytes >= 128, and it would be
legal UTF-8, then it's very likely that it *is* UTF-8. As I said,
I would be interested if you can come up with any real document for
which this heuristic fails.

*Shrug* You speak English, and you're willing to take that risk. I speak
a language which _does_ use diacritics, and I'm not.

Richard
.



Relevant Pages

  • Re: On Languages.
    ... but that the Afghans do not speak English. ... Also bilingualism is very common in Afghanistan. ... There are tens if not hundreds of languages speaking this very moment in ...
    (soc.culture.baltics)
  • Re: Social Security Admin or scam?
    ... Before English came to North America, ... >>>several different languages that existed here before then. ... >>>the only language that came over before a good portion of North America ... People in the US who cannot speak English, ...
    (rec.woodworking)
  • Re: Foreign Languages- A Waste of Time.
    ... people speak English, and prefer to speak it to foreigners. ... speakers who live in non-English speaking countries for decades and who ... In other countries, if you speak local languages, people start treating ...
    (sci.lang)
  • Re: Dexter
    ... and Mexican invasion as these folk will not learn English or assimilate. ... but also Eastern European languages). ... I'm waiting for someone to prove that the 3rd generation of immigrants ... STILL do not speak English. ...
    (rec.arts.tv)
  • Re: A China-Sumer connection
    ... relatives speak on average 6 ... languages - fluently - and they are unrelated languages. ... learned English in a year. ... It's the nomads that are fluent in the ...
    (sci.archaeology)