Re: Editor component



On 2007-02-18, Hans-Peter Diettrich <DrDiettrich1@xxxxxxx> wrote:
Why is it simplified? For proper unicode support, you still have to look out
for surrogates with a 2 byte system?

Unicode doesn't guarantee that a single display character is represented
by a single code point. Therefore I intend to simplify things, and
assume UCS-2 with 1 code point for 1 character. If somebody wants to
implement a more sophisticated rendering, he can override all the
related methods in a derived class. Then also proportional fonts may
become acceptable, and more things, which are not required in an source
code editor.

It'd be a bit uncomfortable for me to start over minus legacy, and then
design in such behaviour (even if overridable). Even if only the more
elaborate East Asian charsets are beyond the basic plane.

From what I see on FPC bugreports, simply the ability to comment in their
own language (read: native character set), without going to constant
character sets conversions, and workarounds, iow the ability to do anything unicode
without additional effort. Sometimes also to use e.g. identifiers with non
A..Z chars in them.

The compatibility with other compilers doesn't leave room for exotic
code pages in source code.

True. We also never implemented it. Some Spanish and Russian mods floated
around for a while though.

I'm not into the internationalisation part of FPC at all, but there is some
codepage setting for the source. possibly to allow the compiler to translate
ascii literals in a codepage to widestring literals or so. (and would only
need a translation table for that cp)

With regards to an source code editor, the compiler and his acceptable
file formats comes into play. Most compilers will accept Unicode only
inside string literals, and possibly in comments.

Beauty of UTF8, makes them easy to skip.

Skip characters in string literals or identifiers? ;-)

Comments in this case even. Even literals have to be parsed. Actually I've
more experience with the skipping for textmode console support. Making sure
that Crt writes complete codepoints to the terminal.
.



Relevant Pages

  • Re: VB - Ascii to Unicode and then Unicode to UTF-8 conversion (Very desperate!!)
    ... Latin together) then you have to use a Unicode column type. ... AscW returns the real Unicode character ... for Chinese characters, ... then the next thing to worry about is your CSV file. ...
    (microsoft.public.vb.general.discussion)
  • Re: Unicode Support
    ... if two Unicode strings are the same? ... UTF-16 is basically telling everyone "ok we all got to start ... character, and will likely support *both* endians. ... UTF-8 encodings are also easy to learn to ...
    (alt.lang.asm)
  • Re: Determining if a string is Unicode
    ... there's nothing magic about Unicode. ... where each character occupies 2 bytes, as opposed to a Single-Byte Character ... You could load up a string with rubbish, ... > INF file like so: ...
    (microsoft.public.vb.general.discussion)
  • Re: KANJD212
    ... >>Who decides the factors and what are their criteria, Unicode? ... But once a character is defined/get a codepoint in Unicode it ... standard modifies the codepoint of the kanji to a totally new ... I can use a code like JIS X0208 along with a font ...
    (sci.lang.japan)
  • Re: Enhanced Unicode support for "Go" tools
    ... the point to remember is that UNICODE is a _character ... It's the fonts, the OS and the application which work together ... society for the protection of French from English ...
    (alt.lang.asm)