Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Jolyon Smith <jsmith@xxxxxxxxxxxxx>
- Date: Fri, 7 Mar 2008 17:01:28 +1300
Without going into the whys and wherefores behind this post, here for
the benefit of anyone that missed it before, is one idea for an
alternative Unicode implementation in Tiburon that would avoid many of
the pitfalls that the implementation (as currently described) is going
to encounter (and cause).
The suggestion/idea:
Extend String RTTI (for the purposes of this post, RTTI here refers to
the runtime properties of a string, i.e. Length and Reference Count).
String RTTI would be extended to include encoding information. For
access efficiency this would likely be a 32-bit value.
Some encoding values would be reserved for specific system
interpretation, representing:
UTF8
UTF16
UTF32
ANSI (system cp)
Remaining values would identify a specific codepage of an ANSI encoded
string.
i.e. at the implementation level, there would continued to be only one
actual "type" of string, but the formal type of any given instance of a
String would include it's encoding.
There would exist, for the purposes of declarations in code:
UTF8String
UTF16String
UTF32String
ANSIString
and
String
String would "map" to one of the string types based on a project
setting. i.e. for an existing application one would most likely choose
to continue with String => ANSIString, but for a new application one
could choose to map String to the UTF encoding of Unicode most
appropriate to that applications needs.
RTL support for strings would be extended to incorporate appropriate,
implicit transcodings. For ANSI => Unicode these would be lossless. For
Unicode => ANSI the compiler could emit a warning.
Specific transcoding support would provide the means for addressing such
warnings if it were not desirable to simply disable that warning in a
project.
e.g. given that the VCL would be fully Unicode
var
s: ANSIString; (or String where String => ANSIString)
s := Edit1.Text; // WARN: Implicit conversion from Unicode to ANSI
The warning could be addressed by either:
- Changing the declaration of 's' to any Unicode string type
(UTF8, 16, 32)
or
- Utilising an explicit transcode:
s := UnicodeToANSI(Edit1.Text);
or
s := UnicodeToANSI(Edit1.Text, cp1251); // etc
or
- Disabling the warning in the project options (likely to be
acceptable for the majority of existing ANSI applications)
Note that explicit transcoding for ANSI=>Unicode is not required (in
order to address warnings) since such transcoding could be lossless
thanks to the specific codepage of the source and the required UTF
encoding of the destination, being able in the RTTI, and so would not
require any warnings:
e.g.
Edit1.Text := s; // Edit1.Text is UTF16, codepage of ANSI s
// is in RTTI. Compiler silently injects RTL
// transcoding for lossless conversion
In general, the only encoding characteristic of a string that may be
changed would be the codepage of an ANSI string.
It would not be possible to otherwise change the encoding of a string
"in place". Attempting to do so, or attempting operations that rely on
it being possible, would result in a compilation error:
i.e.
var
s: UTF8String;
s := UnicodeToANSI(Edit1.Text); // ERROR: Incompatible types
That's covered the basics I think. I'm running out of time (now gone
5pm on a Friday afternoon and I have to go collect my daughters from
after school care).
IANACW, so I would prefer it if people commenting on the idea could
concentrate on the idea and NOT on nitpicking about what is or isn't
"RTTI", what is or isn't an "encoding", what is or isn't "transcoding"
etc etc.
If any questions arise from inappropriate use of such terminology kindly
restrict comments on that score to clarifying for others, if such
clarification is genuinely needed.
Enjoy,
Jolyon Smith
.
- Follow-Ups:
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Andrew Fiddian-Green
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Marc Rohloff [TeamB]
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Hans-Peter Diettrich
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Rudy Velthuis [TeamB]
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Kostya
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: OBones
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Chris Rolliston
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Dave Nottage [TeamB]
- Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Kryvich
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Eric Grange
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- From: Pavel S
- Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- Prev by Date: Re: Now that the internet/web fad is over...
- Next by Date: Re: O'Reilly's State of the Computer Book Market
- Previous by thread: Software Assurance Upgrade
- Next by thread: Re: Suggested Alternative Unicode Implementation (for Rudy+ misc others)
- Index(es):
Relevant Pages
|