Re: Is there a better way to convert foreign characters?



bugbear <bugbear@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Jürgen Exner wrote:
bugbear <bugbear@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Jürgen Exner wrote:
First of all how would you react, if someone is mangling your name?
There is no "English version" of my first name.
But an English speaker might well search for "Jurgen Exner"
and hope to find you.

And my name may come up as the closest hit with a 91% match.

Accent folding is a key component of "loose" matching.

Having a second, closer look you are right. The OPs character set is
indeed very restricted to just simple accented characters and doesn't
include any of the more complex or additional characters found in the
other Latin-X sets.

Of course, accent folding only helps searching in a limited context.

If you have (e.g.) Japanese, Thai, Arabic data,
you're stuffed.

Not even talking about those but simple Skandinavian, Baltic, and even
German or Polish letters.

jue
.



Relevant Pages

  • Re: [OT] I love that writing style. (Was: Re: Is this Regular Expression for UTF-8 Correct??)
    ... consider if the following were a specifcation for a C identifier: ... doesn't make sense to an English language user to see comma as a "letter", ... interesting characters I found within the first 1/3 of the Unicode characters. ... to use accented characters is a similar burden. ...
    (microsoft.public.vc.mfc)
  • Re: Regarding retirees applying for new mortgages.
    ... |> You can probably get most of those characters from the ... |> press the three keys for the code on you NUMERIC ... |> letters on your typewriter keyboard. ... | I get accented characters from the United States - International code set, ...
    (soc.retirement)
  • Re: Like operator with accented character
    ... characters in defining sets of characters inside the square brackets) just ... Well, if it is working except for the accented characters, that much is ... The accented characters are being included in this filter without being ... Is that right, John? ...
    (microsoft.public.access.queries)
  • Re: Regarding retirees applying for new mortgages.
    ... You can probably get most of those characters from the ... I get accented characters from the United States - International code set, ... This code set makes accent characters into dead keys that are resolved upon ... You can also get them by using Windows Character Map. ...
    (soc.retirement)
  • Re: Regarding retirees applying for new mortgages.
    ... You can probably get most of those characters from the ... I get accented characters from the United States - International code set, ... Edh and Thorn are in there, ... be pronounced the French way rather than the Italian way. ...
    (soc.retirement)