Re: Function for removing Accents?
- From: Thomas Weidenfeller <nobody@xxxxxxxxxxxxxxxx>
- Date: Tue, 16 May 2006 17:16:01 +0200
Chris Uppal wrote:
You could probably speed up this process considerably by using a pre-existing
Unicode package such as ICU:
I am not sure :-) My understanding of ICU after checking the documentation is that it doesn't do the destructive thing the OP might want to do. It looks more as if the authors of ICU tried very hard to get every aspect of Unicode right. Mapping an accented character to a single non-accented "equivalent" is certainly not right in the scope of Unicode, and also not in the scope of non-ascii languages.
The effort to invest in a solution also depends on how good the solution has to be. Since the original text is anyhow supposed to be butchered, I don't see a reason for 100% accuracy.
So, scripting the parsing of the UCD for finding the interesting values should not take that much time. I would guess less than an hour. That should include scripting of checking the decomposition values for these "bad" accents (probably code points starting at 0x300 up to some value I forgot). The result should be a map of a bunch of characters.
Some more scripting to get that output into a Java data structure, add a lookup method, compile, and that's it.
Incidentally, why is ICU never mentioned around here ?
Probably because people don't know about it (I didn't). And probably because it solves problems not many people have each day.
/Thomas
--
The comp.lang.java.gui FAQ:
ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq
http://www.uni-giessen.de/faq/archiv/computer-lang.java.gui.faq/
.
- Follow-Ups:
- Re: Function for removing Accents?
- From: Luc The Perverse
- Re: Function for removing Accents?
- References:
- Function for removing Accents?
- From: Luc The Perverse
- Re: Function for removing Accents?
- From: Thomas Weidenfeller
- Re: Function for removing Accents?
- From: Chris Uppal
- Function for removing Accents?
- Prev by Date: Re: To store a huge table during start-up of a J2EE application
- Next by Date: Re: Can anyone recommend a good IDE
- Previous by thread: Re: Function for removing Accents?
- Next by thread: Re: Function for removing Accents?
- Index(es):
Relevant Pages
|
|