Re: System.WCh_Cnv
- From: "Dmitry A. Kazakov" <mailbox@xxxxxxxxxxxxxxxxx>
- Date: Tue, 25 Jul 2006 15:36:42 +0200
On Tue, 25 Jul 2006 14:03:21 +0100, Marius Amado-Alves wrote:
So I'm quite happy with UTF-8 and plain strings.
I am more or less happy with this too [1], but I think we can do
better. With UTF-8 in strings the two abstractions (codepoints,
encodings) are too entangled for my taste. In rigour you cannot use
the standard string operations.
Yes, not all of them.
I mean you can but must fiddle with
the encodings i.e. you are not searching for a codepoint but for a
particular encoding. Instead I want to be able to write things like
for I in Str'Range loop
if Str (I) = Euro_Sign then ...
end loop;
I cannot do that with UTF-8 in strings.
I do it this way:
declare
Index : Integer := Str'First;
Value : UTF8_Code_Point;
begin
while Index <= Str'Last loop
Get (Str, Index, Value);
if Euro_Sign then ...
end loop;
Actually if Ada had abstract array interfaces and inheritance we could have
it in exactly the form you wrote it. Alas.
Note that the pattern you refer is beyond just Unicode issues. Exactly the
same problem exists in pattern matching:
while Index <= Str'Last loop
if Match (Str, Index, Pattern) then ...
end loop;
Basically it is a stream interface to strings with an ability to roll it
back or, equivalently, to look ahead.
Note that Wide_Wide_String is
of little help here, because of the endianess issue. But it might be
a good idea to base Unico on Wide_Wide_String for closeness to the
standard.
I prefer general solutions, like array interfaces. You have an opaque
object. Add an array interface to it, which would return code points or
Wide_x_100_Character or whatever you want. Here you are.
[1] What makes me happy about UTF-8 is that it seems to have become a
de facto default, common denominator encoding.
Long live Linux! (:-))
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
.
- References:
- Re: System.WCh_Cnv
- From: Marius Amado-Alves
- Re: System.WCh_Cnv
- From: Dmitry A. Kazakov
- Re: System.WCh_Cnv
- From: Marius Amado-Alves
- Re: System.WCh_Cnv
- Prev by Date: Re: System.WCh_Cnv
- Next by Date: Re: System.WCh_Cnv
- Previous by thread: Re: System.WCh_Cnv
- Next by thread: Re: System.WCh_Cnv
- Index(es):
Relevant Pages
|