Re: Attention: European C/C++/C#/Java Programmers-Call for Input



Paul K. McKneely wrote:
"David Brown" <david.brown@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:O_6dnYV7E52xuR_UnZ2dnUVZ8u2dnZ2d@xxxxxxxxxxx
I would suggest you start by giving up on all your thoughts of specific character sets. Simply make a straight decision now - you will use UTF-8. No other encodings - no Latin-1, no UTF-16, no home-made character sets, no extra fonts. Take it as a fixed decision and work with it for a few days to see how it fits your needs. Look at existing tools and source code that supports UTF-8, and see how it can make your work easier and give a result that users might actually be able to *use*. If you really put in this effort and find that UTF-8 does not fit your needs, what have you lost? A couple of days work here is a drop in the ocean compared to the man-years it will take to work with your home-made encoding, and you will at least have the benefit of a better understanding of your problem. You might even be able to explain it to other people in a way that makes sense.

I want you to know that I lost most of a good night's
sleep over this post. In my anguish my brain mulled
it over and I came up with a plan. First, I will give
you some background (and a great deal of credit
for my suffering :). Original conception for ?Text
was circa 1985. Actual development began in 1988.
It is basically a superset of ASCII. The ASCII part,
as you well know, is not proprietary. But the key
point is that ?Text began in 1985 as a byte-endian
independent streaming format (as well as a flat 32-bit
character format) much like UTF-8 which itself
was in flux until late 2003. Both ?Text and UTF-8
use the high-order bit to determine what comes next
as escape bytes in their byte stream encoding.
Although streaming ?Text is much more all-inclusive
than UTF-8, its symbol set is not as large (which is
really all there is to UNICODE anyway). So I am
really not just starting out as you might have the
impression. I have probably 10 full man years
already into this. I just started working on the
5th generation of the ?Text editor since about
August and have been working on the 2nd
generation compiler since about a year ago.
It is early enough, I could make the 5th generation
editor change its course but I will have to start
completely over on the compiler (3rd generation).

I really have a lot more than you probably think
at stake. My business partner (in a medical
networking device communications business) keeps
urging me that I need to think about retiring in about
10 years. If I were to abandon what I have
already done, the whole thing would collapse and
I would have little more than UNICODE left.
Rather than do this, I would just give up and
do something completely different.


I'm beginning to get a vague idea of what you are talking about. When you gave the domain name, I was able to guess that the character your newsreader fails to post in "?Text" is a phi, and googling for "phitext" gave me this:

<http://lists.planix.com/pipermail/lout-users/1995q4/000297.html>

It also gave some hits for source code, such as this:

<http://read.pudn.com/downloads62/sourcecode/compiler/215357/compiler/PHITEXT.C__.htm>

As far as I can see, back in 1988 you were interested in producing a general but efficient way to encode a wider range of characters than ASCII. You had slightly different priorities than the Unicode folks, who were starting at the same sort of time. In particular, you have far fewer possible characters (using 11 bits for a total of 1536), but unlike Unicode you use another 21 bits to store visual information such as text styles, weights, fonts, and colour.

To support this system, you have been working on a text editor, a compiler, and an embedded operating system. You are now working with conversion tools so that a programmer could store their source code in Unicode, and translate it into phiText for your compiler.

Am I right so far?

How much of this software is actually developed? How many people are involved in creating it? How many users do you have? Has it actually been used in real systems?

mvh.,

David



<snip rest to save space>
.



Relevant Pages

  • Re: Unicode Support
    ... > Not knowing much about UTF-8 (my Unicode knowledge extends as far as ... > literal strings of this form as long as the character code for quote ... > can never appear in a MBCS (multibyte character sequence). ... then XP Notepad directly understands UNICODE and you can ...
    (alt.lang.asm)
  • Re: Unicode string libraries
    ... encoding negotiation. ... old languages which have adopted Unicode without much pain. ... compatibility with too many old programs; but char as a holder for UTF-8 ... The limitations of UTF-16 ...
    (comp.programming)
  • Re: Understanding simplest HTML page
    ... Even the BBC managed to put invalid ... > technical details of using a particular encoding, ... Bengali and so on using utf-8 ... Mozilla has routines for automatically guessing at character ...
    (comp.infosystems.www.authoring.html)
  • Re: Attention: European C/C++/C#/Java Programmers-Call for Input
    ... Simply make a straight decision now - you will use UTF-8. ... character format) much like UTF-8 which itself ... I would have little more than UNICODE left. ... generator is assembly language. ...
    (comp.arch.embedded)
  • Re: convert from utf-8 to unicode(excel)
    ... Is there a possibility to properly convert under Windows from utf-8 ... encoding to unicode ... There is no problem in conversion when I do it in Notepad. ... a file marking encoding as UTF-8 and then save it marking encoding as ...
    (comp.editors)