Re: ASCII to UTF-8
- From: "Maarten Wiltink" <maarten@xxxxxxxxxxxxxxxxxx>
- Date: Mon, 21 Nov 2005 22:55:23 +0100
"Tom de Neef" <tdeneef@xxxxxxxx> wrote in message
news:438209cc$0$11066$e4fe514c@xxxxxxxxxxxxxxxxx
> I received a XML file with header
> <?xml version="1.0" encoding="UTF-8"?>
>
> The browser can't read the file because there are diacritics in it,
> coded in ASCII.
ASCII covers only code points 0 through 127. You probably mean ISO-8859-1.
> When I change the header to
> <?xml version="1.0" encoding="ascii"?>
That should be "us-ascii".
> the browser accepts the file and displays it correctly.
Probably because it falls back to windows-1252 in the face of an
encoding it doesn't recognise at all.
> But... the file is to imported into Borland's Translation Manager and
> that one only accepts UTF-8, so I can't change the encoding string.
>
> Is there an easy way in which I could convert the file from ASCII to
> UTF-8?
type TByteArray = array of Byte;
function UTF8Encode(const C: char): TByteArray;
begin
if (C<#$80) then
begin { 0xxxxxxx }
SetLength(Result, 1);
Result[0]:=$00 or ((Ord(C) shr 0) and $7f);
end
else { if (C<#$7ff) then }
begin { 110xxxxx 10xxxxxx }
SetLength(Result, 2);
Result[0]:=$c0 or ((Ord(C) shr 6) and $1f);
Result[1]:=$80 or ((Ord(C) shr 0) and $3f);
end;
end;
Just to demonstrate the mapping, mind you. It's of course much more
practical to transform a string in place by inserting a few characters.
Do keep in mind that UTF-8 is an _encoding_. The resulting string no
longer contains characters (of whatever length), it contains bytes.
So in-place transformation is practical only on ansichar strings.
Groetjes,
Maarten Wiltink
.
- References:
- ASCII to UTF-8
- From: Tom de Neef
- ASCII to UTF-8
- Prev by Date: Re: Checkbox with colored background?
- Next by Date: Re: ASCII to UTF-8
- Previous by thread: ASCII to UTF-8
- Next by thread: Re: ASCII to UTF-8
- Index(es):
Relevant Pages
|
|