Re: ASCII to UTF-8



"Tom de Neef" <tdeneef@xxxxxxxx> wrote in message
news:438209cc$0$11066$e4fe514c@xxxxxxxxxxxxxxxxx

> I received a XML file with header
> <?xml version="1.0" encoding="UTF-8"?>
>
> The browser can't read the file because there are diacritics in it,
> coded in ASCII.

ASCII covers only code points 0 through 127. You probably mean ISO-8859-1.


> When I change the header to
> <?xml version="1.0" encoding="ascii"?>

That should be "us-ascii".


> the browser accepts the file and displays it correctly.

Probably because it falls back to windows-1252 in the face of an
encoding it doesn't recognise at all.


> But... the file is to imported into Borland's Translation Manager and
> that one only accepts UTF-8, so I can't change the encoding string.
>
> Is there an easy way in which I could convert the file from ASCII to
> UTF-8?

type TByteArray = array of Byte;
function UTF8Encode(const C: char): TByteArray;
begin
if (C<#$80) then
begin { 0xxxxxxx }
SetLength(Result, 1);
Result[0]:=$00 or ((Ord(C) shr 0) and $7f);
end
else { if (C<#$7ff) then }
begin { 110xxxxx 10xxxxxx }
SetLength(Result, 2);
Result[0]:=$c0 or ((Ord(C) shr 6) and $1f);
Result[1]:=$80 or ((Ord(C) shr 0) and $3f);
end;
end;

Just to demonstrate the mapping, mind you. It's of course much more
practical to transform a string in place by inserting a few characters.
Do keep in mind that UTF-8 is an _encoding_. The resulting string no
longer contains characters (of whatever length), it contains bytes.
So in-place transformation is practical only on ansichar strings.

Groetjes,
Maarten Wiltink


.



Relevant Pages

  • ASCII to UTF-8
    ... When I change the header to ... the browser accepts the file and displays it correctly. ... Is there an easy way in which I could convert the file from ASCII to UTF-8? ...
    (comp.lang.pascal.delphi.misc)
  • Re: get pic from my mail
    ... header with header. ... output the string to the browser. ... echo imap_base64; ...
    (comp.lang.php)
  • Re: Showing a message to IE 5+ users (yes, the browser detection question again)
    ... Like XSLT to Opera 8.x, ... One fool may use a UA string test to server XSLT, ... HTML/CSS/javascript to a browser that could handle it if it got it. ... I do not recall "mozilla" in HTTP.USER_AGENT string would be ever ...
    (comp.lang.javascript)
  • Deutsch nach ASCII konvertieren
    ... "ascii", $string, sub { ... also das to_ascii offenbar einen Latin-1-kodierten String will. ... Gefunden habe ich weiterhin eine kleine Sub, die die Ersetzung ... Ähnlich wie bei Variante 1 wird hier die Menge der Ersetzungs-Regexen ...
    (de.comp.lang.perl.misc)
  • Re: Is it possible for me to have an alert pop-up when I open a do
    ... them and clean up the whole header. ... Dim TheWeekOfStr As String ... After I enabled macros and changed the security level, as per Dave Peterson, ... I got almost what I wanted, except that the pop-up box contains the font ...
    (microsoft.public.excel.misc)