Re: System.WCh_Cnv



On Tue, 25 Jul 2006 14:03:21 +0100, Marius Amado-Alves wrote:

So I'm quite happy with UTF-8 and plain strings.

I am more or less happy with this too [1], but I think we can do
better. With UTF-8 in strings the two abstractions (codepoints,
encodings) are too entangled for my taste. In rigour you cannot use
the standard string operations.

Yes, not all of them.

I mean you can but must fiddle with
the encodings i.e. you are not searching for a codepoint but for a
particular encoding. Instead I want to be able to write things like

for I in Str'Range loop
if Str (I) = Euro_Sign then ...
end loop;

I cannot do that with UTF-8 in strings.

I do it this way:

declare
Index : Integer := Str'First;
Value : UTF8_Code_Point;
begin
while Index <= Str'Last loop
Get (Str, Index, Value);
if Euro_Sign then ...
end loop;

Actually if Ada had abstract array interfaces and inheritance we could have
it in exactly the form you wrote it. Alas.

Note that the pattern you refer is beyond just Unicode issues. Exactly the
same problem exists in pattern matching:

while Index <= Str'Last loop
if Match (Str, Index, Pattern) then ...
end loop;

Basically it is a stream interface to strings with an ability to roll it
back or, equivalently, to look ahead.

Note that Wide_Wide_String is
of little help here, because of the endianess issue. But it might be
a good idea to base Unico on Wide_Wide_String for closeness to the
standard.

I prefer general solutions, like array interfaces. You have an opaque
object. Add an array interface to it, which would return code points or
Wide_x_100_Character or whatever you want. Here you are.

[1] What makes me happy about UTF-8 is that it seems to have become a
de facto default, common denominator encoding.

Long live Linux! (:-))

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
.



Relevant Pages

  • Re: C# coding guidelines: use "this." or not when referring to member fields/properties within the
    ... Alphabetic for strings isn't quite so ... One example of where people go wrong is when they want to optimise loop ... implementation so that each iteration takes 10% less time will only ...
    (microsoft.public.dotnet.languages.csharp)
  • RE: Error handling in a Do Loop
    ... I made the changes you suggested to my strings - blanking them correctly now. ... have any further suggestion or pointers, ... > When loop is able to ping a computer and able to pull the information, ... when the loop does not ping a computer/unable to ...
    (microsoft.public.windows.server.scripting)
  • Re: Stupid Newbie Needs Help
    ... Without the loop the program works fine with the ... with 0-terminated strings, that way you can take advantage of C's ... hold up to 10 tokens, where each token may be up to 80 characters ... should give a clue of how the variable/constant/function/macro is ...
    (comp.lang.c)
  • Re: timeit module: am I missing something obvious?
    ... Why am I passing strings around when functions are ... best of 3 trials: 0.792 usec per loop ... quickly choosing number of iterations. ... - The result from the final pass through the convergence loop ...
    (comp.lang.python)
  • Re: To Richard Heathfield: enoughs enough
    ... the effect of using strlen in the for loop ... > with larger strings but couldn't be bothered to hoist the strlen ... >> causes trouble when you scale up. ... > an evil language too, ...
    (comp.programming)