Re: replace chars



Tom Phoenix wrote:
On Dec 26, 2007 3:05 AM, Octavian Rasnita <orasnita@xxxxxxxxx> wrote:
I want to replace some special characters with their corresponding Western
European chars, for example a with a, â with a, s with s, t with t, î with i
and so on.

I thought that all those characters were included in the Western European character set ISO-8859-1, and if so, your requirement makes no sense. Do you possibly mean corresponding ASCII characters?

Could you please recommend a module that can do this?

You might be able to do what you want with Encode.

http://perldoc.perl.org/Encode.html

Might he? How?

The Swedish alphabet contains three non-ascii characters: å, ä and ö. To my knowledge, there is no official encoding scheme that converts them to a, a and o respectively. That's natural, since 'å' is a completely different character than 'a' etc.

Sometimes, the special Swedish characters are converted in an English context, and based on how they are pronounced, like this:

å -> ou
ä -> ae
ö -> oe

I believe the OP will need to identify all the characters he would like to see converted, and code the conversion rules himself using the tr/// or s/// operator.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
.



Relevant Pages

  • Re: Base36
    ... Thanks Justin. ... > characters that may be confused for other characters when read by a human. ... > "Roy Fine" wrote in message ... >> conversion is each direction is based on the tokens and powers arrays. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Forth 200x, S\q
    ... Is it true that for \xAB the two hex characters A and B should ... according to 3.2.1.2 Digit conversion. ... Digit conversion). ...
    (comp.lang.forth)
  • Re: Forth 200x, S\q
    ... Is it true that for \xAB the two hex characters A and B should ... according to 3.2.1.2 Digit conversion. ... Digit conversion). ...
    (comp.lang.forth)
  • printf conversions
    ... Steele, Fifth Edition, paragraph "The s conversion" page 396: ... first p characters of the output string or up to but not ... The asterix in the format control string should cause the argument pull ... Charles M. "Chip" Coldwell ...
    (comp.os.vms)
  • Re: Bug in vstudio.NET 2003 codecvt facet
    ... > from single char characters to single wchar_t characters, ... > character string, see below. ... >> string but the conversion fails. ... > _cpp_isleadbyte uses the global locale rather than the specific locale ...
    (microsoft.public.vc.stl)