Re: character mapping functions and UNICODE : remove accents, case, etc
From: Alan J. Flavell (flavell_at_ph.gla.ac.uk)
Date: 10/23/03
- Next message: Bart van den Burg: "XML parser"
- Previous message: Bart van den Burg: "Re: GD module"
- In reply to: An. Valula: "Re: character mapping functions and UNICODE : remove accents, case, etc"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 23 Oct 2003 19:18:40 +0100
On Thu, 23 Oct 2003, An. Valula floated out upon a sea of TOFU:
> thank you for your answer, but, no, I do not want to remove bold or
> paragraph marks.
But that *is* what the term "rich text" format normally refers to -
whether used in the generic sense or in particular reference to
Microsoft's "RTF" interchange specification.
> I want to convert "rich" text to "poor" text.
Not really, and that's why you confused the previous respondent. You
need some better term. (Try a glossary of text processing if you
don't believe me).
> There must be someone else who wants to compare strings without diacritical
> signs ?!
Is there a problem? You already know one solution.
> > does anyone out there know about perl capabilities to convert rich
> > text, such as "étrangères" to "etrangere" (remove accents)?
> > Of course, tr/éè/ee/ would do, but I look for sth better: you do not
> > tr/a-z/A-Z/ for uc(), do you?
You probably should note that your tr/// and your uc() perform
*different* operations, in general - also depending on the locale
setting.
Anyhow, I don't have an answer to your requirement, other than the
obvious one. Well, perhaps I do: you could "do the Unicode
decomposition" thing, but it would seem distinctly inefficient
compared to a tr///
Have a look at e.g http://www.perldoc.com/perl5.8.0/pod/perlretut.html
and see whether you really want to fight this via Unicode-style regex
features. If you want to be sure of covering accents that you've
never even heard of, then I guess that's the way to go, but if you're
just looking for the usual Western-European accents then me, I'd go
with the tr/// I reckon. But this is all supposition - it's not a
requirement which I've needed myself.
- Next message: Bart van den Burg: "XML parser"
- Previous message: Bart van den Burg: "Re: GD module"
- In reply to: An. Valula: "Re: character mapping functions and UNICODE : remove accents, case, etc"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]