Re: Function for removing Accents?



"Thomas Weidenfeller" <nobody@xxxxxxxxxxxxxxxx> wrote in message
news:e4cqbi$k9l$1@xxxxxxxxxxxxxxxxxxxxxxxxx
Chris Uppal wrote:
You could probably speed up this process considerably by using a
pre-existing
Unicode package such as ICU:

I am not sure :-) My understanding of ICU after checking the documentation
is that it doesn't do the destructive thing the OP might want to do. It
looks more as if the authors of ICU tried very hard to get every aspect of
Unicode right. Mapping an accented character to a single non-accented
"equivalent" is certainly not right in the scope of Unicode, and also not
in the scope of non-ascii languages.

I did say approximate representation.

The thing that annoys me the most is the song by Tool named ænima.

The effort to invest in a solution also depends on how good the solution
has to be. Since the original text is anyhow supposed to be butchered, I
don't see a reason for 100% accuracy.

To be honest the solution that I have now works just fine. I have hardcoded
ReplaceAlls for every character appearing in my file name scope.

Incidentally, why is ICU never mentioned around here ?

Probably because people don't know about it (I didn't). And probably
because it solves problems not many people have each day.

You are not the only one that had not heard about it.

Unicode is such a beautiful thing - I wonder why it takes so long to catch
on?

--
LTP

:)


.



Relevant Pages

  • VMS port of ICU, International Components for Unicode libraries, anyone?
    ... I am working on a project that needs a VMS port of the IBM ICU ... The International Components for Unicode libraries provide ...
    (comp.os.vms)
  • [ANN] ICU4R 0.1.0 - initial release
    ... ICU4R is an attempt to provide better Unicode support for Ruby, ... = Install Notes ... To build ICU4R you'll need GCC and ICU v3.4 libraries, ...
    (comp.lang.ruby)
  • Re: [ANN] ICU4R 0.1.0 - initial release
    ... > ICU4R is an attempt to provide better Unicode support for Ruby, ... > on ICU library. ... > ICU4R is Ruby C-extension binding for ICU library. ...
    (comp.lang.ruby)
  • Re: Function for removing Accents?
    ... It looks more as if the authors of ICU tried very hard to get every aspect of Unicode right. ... Mapping an accented character to a single non-accented "equivalent" is certainly not right in the scope of Unicode, and also not in the scope of non-ascii languages. ... That should include scripting of checking the decomposition values for these "bad" accents. ...
    (comp.lang.java.programmer)
  • Question asked on OOo mailing list - probably better asked here ??
    ... After a discussion on how to bind a unicode glyph or accented character ... command 'ucode' with the following syntax: ... duplicate shortcut to be created by accident. ...
    (Fedora)