Re: Function for removing Accents?



"Chris Uppal" <chris.uppal@xxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:4469c6fa$0$639$bed64819@xxxxxxxxxxxxxxxxxxxx
Luc The Perverse wrote:

Right now I have a crude hard coded method using a series of replaceAll's
for removing accents and converting them to their approximate non
accented
equivalents.

I doubt if that is either meaningful or possible in general. It is
certainly
not easy.

My wife does not know to push ALT 0 2 4 4 for an ñ, so she would have a hard
time searching for Piña Coladas if she wanted to hear that Garth Brooks'
song.

If you try it at all, and are not satisfied with a handful of hardwired
mappings, then you'll probably have to get deeply into Unicode. See:
http://www.unicode.org/reports/tr15/index.html
for one of the Unicode technical reports which discusses decomposition of
characters into base characters plus various kinds of diacritical marks.
You
could presumably filter out characters representing diacritical marks
leaving
the character which was qualified by the marks in place.

AH! I don't want to "learn" unicode. Especially after looking at that
link!

Perhaps I should have explained in more detail the scope of my question.

--
LTP

:)


.



Relevant Pages

  • Re: Function for removing Accents?
    ... for removing accents and converting them to their approximate non accented ... characters into base characters plus various kinds of diacritical marks. ...
    (comp.lang.java.programmer)
  • Re: Unicode Support
    ... >> (I know this is a poor example, but think about other languages, eg ... First things first, when you register your RosAsm windows classes, you ... the messages with ANSI / UNICODE parameters in ANSI or UNICODE form... ... with their alphabet characters, as with the numbers and punctuation...so, ...
    (alt.lang.asm)
  • Re: heeeeeeeeeeeeeeeellllllllllllllppppppppppppppppppppp
    ... This means that if you develop the bad habit of using char * (left over ... It usually takes me five minutes to create a Unicode version of any of my apps, ... BOOL and bool are different data types. ... can be up to MAX_PATH characters). ...
    (microsoft.public.vc.mfc)
  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)
  • Horribly overdue update to unicode.txt
    ... of the Linux Assigned Names And Numbers Authority project. ... The Linux kernel code has been rewritten to use Unicode to map ... In particular, ESC (U is no longer "straight to font", since the font ... Actual characters assigned in the Linux Zone ...
    (Linux-Kernel)