Re: Enhanced Unicode support for "Go" tools

From: Beth (BethStone21_at_hotmail.NOSPICEDHAM.com)
Date: 05/20/04


Date: Thu, 20 May 2004 13:36:10 +0100

The Wannabee wrote:
> Beth wrote:
> > _PEOPLE_ and their culture and so forth...this takes beyond
mere
> > "technical inaccuracy" into the realm of potentially being
"you
> > offensive little git!!" type insult to some people...
>
> :-)
>
> Beth, thanks for the posts. They made me think of UNICODE a
little
> diffrently. I think I can actually take it from here, by
experimenting,
> but still.

No, really, _you_ should experiment...find out for
_yourself_...there is no substitute for that...and how on Earth
do you think everyone else did it, anyway? ;)

> Lets pretend I had written a wordprocessor, and stored each
> word seperatly in the data, and processed each word seperatly
form the
> data (lets pretend). Then pretend I wrote a love letter to
someone, and in
> the middle of it wanted to site a Hebrew poem.

[ A Hebrew poem in the middle of a Love letter? Pretenious
nonsense...she wouldn't be impressed..._HONEST_ or don't even
bother...better a crap "roses are red" that is _MEANT_ than a
million pretenious "aren't I so damn clever?" terrible
illusions...I mean, you writing it for her or you writing it to
inflate your ego about how real clever you look? Declaring Love
or just trying to hook a "trophy" on your arm? Maybe you can't
tell the difference but she will be able to, I guarantee
you...though, you might be lucky...she might not care either
way...Love makes people do funny things, after all... ]

> 1. How do the Processor/drawing routine detect the hewbrew
> 2. And how does it then adjust itself to draw the hewbrew.
>
> For simplisity, lest assume its just one line of hebrew, say 4
words.

[ Woah! A "concise" poem, for sure ;) ]

> Thanks again for the effort. I think its somewhat easier to
read you, than
> the original sources, as you use that chatty language.

Aaah, the point to remember is that UNICODE is a _character
set_...what you're seeing on the screen is a _FONT_...exactly
the same as ASCII (or ANSI) is a _character set_...but what
you're seeing on the screen is "Arial" or "Courier" or "Times
New Roman" _FONTS_...

So, this is a bit like asking - with ASCII - "How does the
processor / drawing routine detect that it's a digit and not an
uppercase character?"...it doesn't...UNICODE - like ASCII - is
_ONLY_ concern with giving each character a numeric value...

For example:

In ASCII, 41h is a capital letter A...

In UNICODE, U+262E is the CND / Peace symbol (to explain,
UNICODE values are written "U+" just to say "this is a UNICODE
character" and then the number after the "U+" is _always_
hexadecimal by convention)...

Works more or less the same way, it's just that ASCII only
covers 127 "American" characters...while UNICODE can potentially
cover well over a million characters (though, in practice,
there's only tens of thousands currently defined...most of which
are in the first 16-bits...basically, all of them except for the
weirder "historical" and "special interest" fonts...plus,
there's all them "Kanji" characters (you know, the Chinese /
Korean / Japanese ideographic "word symbols" I was talking
about before ;)...there's thousands of those just to
themselves...and, in fact, they've attempted to _unify_ the
Chinese / Korean / Japanese characters together rather than list
thousands for them all separately, where many of the symbols -
at least visually - are common between them :)...

So, UNICODE is only about the _character set_...giving a
"number" to each character...

It's the fonts, the OS and the application which work together
to get it displayed on the screen properly...the fonts being a
big part of that because they defined the shape of the character
and implicitly give the "metrics" for spacing it out...the OS,
of course, handles fonts...that is, you "LOGFONT" and all that
stuff to get the font and then just "DrawText" or
"GetFontMetrics" or whatever it is you need to do...all that
stuff is the same as with any other Windows font (in fact, some
of the fonts you've already been doing this to probably have
some UNICODE enabled...the "freebie" font like Arial are hardly
comprehensive UNICODE fonts but they are UNICODE fonts)...

The thing the application needs to do is "detect" certain things
like "is this character in the Hebrew range?"...and, if it is,
account for the fact that it's writing right-to-left, not
left-to-right in how it deals with it...as I say, though,
UNICODE defines the characters and the OS and the fonts define
most of the display stuff...just a case of the application
relating the two...note that, of course, your text editor
application would already be responsible for working out line
breaks and positioning the text in the window and so
forth...this is no different...it's just the case with UNICODE
that the range of characters is massively increased...there's
more things to remember like languages that write left-to-right
_AND_ right-to-left...so you'll need to add in some extra
program logic for that (but UNICODE does try to help with things
like "directionality" characters...so you can embed changes of
direction and there is a "standard algorithm" for these things
already defined in the UNICODE standard to help out :)...

Plus, my UNICODE book comes with a few CD...and on the CD is a
big text file that lists out, for each character, a bunch of
"attributes"...such as number, name (each character has an
"official name"...though, the only application that should ever
really need that is "character map" so it can list the correct
names for the characters you select :), case (is it uppercase /
lowercase / no case), direction (default direction...English is
left-to-right by default, Arabic right-to-left), etc.,
etc....so, another (sensible) approach is to just write a little
utility that automatically parses the file (designed to be
easily machine read :) and builds up some "table" that an
application can use...

Also, of course, if you're using a HLL like C or Pascal or BASIC
then, chances are, someone has already written a "UNICODE
library" that you can use...that is, with C for example, a
"libc" standard library where it has "strlen", "sprintf" and all
those other string functions implemented to work with "w_char"
(the latest C standards include "w_char")...hence, if you're
using C, then you can let the fonts deal with the "looks", the
OS to deal with the fonts, the special UNICODE "libc" to deal
with most of the string operations for you and then it's pretty
easy to get it UNICODE enabled...

Note, of course, that you have all those "W" API
functions...those are the UNICODE Windows API...just create
16-bit wide strings with UNICODE characters rather than ANSI
characters and call "DrawTextW" rather than
"DrawTextA"...presuming you've the fonts installed for the
characters you're asking for, then it'll print out U+262E (the
peace symbol), just as happily as you're printing out ANSI 41h
(capital "A") at the moment...

> PS: You seem like Betov says, a bit pedantic maybe.(no
insult).

Funny, ain't it?

One set of people attack me for not being pedantic enough and
letting "technical inaccuracies" flow because I was being too
lax and colloquial and so forth...another set of people attack
me for being too pedantic...

In this case, we are talking _people_ and their culture and
their language...I'm not under the orders of Donald Rumsfeld...

So, please, recognise the difference between _RESPECT for fellow
human beings, their languages and culture_ (to which they hold
dear...heck, Rene should know better than most...France has a
society for the protection of French from English
influence...France is also reknown in Europe's "Eurovision Song
Contest" (well done to the Ukraine, by the way...as our Irish TV
presenter, Terry Wogan put it: "look, it's Xena, Warrior
Princess"...the Ukraine entry had them all dressed in Xena-like
leather costumes, doing the "wild dance", as the song was
called...it deserved to win because it had some "oomph" to it
while most of the other entries were trying the old "Love ballad
about World Peace" angle to try to get all the votes...but the
Ukraine went with "men and women in skimpy leather costumes danc
ing about like mad people"...it had some actual _energy_ to it!
So, next year, we're all off to the Ukraine...cool! It's really
tacky and dreadful and most of the songs are crap but I just
Love the Eurovision...in fact, the British commentator, Terry
Wogan, makes it a joy to watch because he takes the mickey out
of it all but in a nice way...he really does Love the contest
but it's his style to be sarcastic all the time...but he got
caught making a sarcastic comment about another country's entry
in an earlier Eurovision and that country didn't quite get the
joke...Terry makes a joke of everything in the Eurovision, you
see...he really does Love it because he's been commentator for
about 30 years(!!) and still does it every year...but it's just
his style to do it in a kind of, well, "Beavis and Butthead"
kind of way...like calling the singer of the Ukraine entry
"Xena, Warrior Princess" because she was running around in
leather, dressed like Xena...it really is laughing _with_ and
not laughing at...it's just his style...he's like that with all
the TV programmes he does...he's actually very funny because he
looks like a gentle old man who wouldn't say a word wrong but
his entire style is just to be sarcastic about
_everything_...the other countries really should NOT take it in
a bad way :) that, unlike many of the other countries, it never
bows to singing in English and all the voting is repeated back
in French, as well as English, and while all the other countries
have the person who reports back the votes for that country
speaking English, France always maintains that their person
speaks _French_...mandating that one of the usually two
presenters of the show has to be a fluent French speaker to talk
to them and translate what they voted into English...note, at
least from the British perspective, we _Love_ the French's
position in standing firm on this...French is a far better
sounding language in song than English, anyway...it's a poetical
and romantic language...and it is meant to be an _international_
contest so hearing other languages is _part_ of the
deal...France did just the _once_ sing in English...I was
thoroughly disappointed, as I think were most people because it
did pathetically badly in the voting...and France has not bowed
to English since...and, in my opinion, it NEVER should...it
doesn't matter with singing and a song contest whether you can
understand what they're singing (and, anyway, switch on the
subtitles at the bottom of the screen and you get the best of
both worlds...the words come up in English so you can understand
them but you still get to hear it in the native language of
whatever country :) and mere "pedanticism" on this particular
topic...

> But when
> you are so clever, and allways optimize every application to
the hilt etc.
> Can you not share a programme with us, open source ?

I post up code snippets...will be working two open source
projects...gotta write the darn things before "sharing" them...

> As good as you are that should be highly educational.

What would be even more "educational" for you, is if you learnt
the stuff to write all the code yourself, surely?

> And I'd love to read you stuff.

We all have our problems, I suppose...

> If you have not a web space, then I can host it for you, if
less then 10 megas,
> + some html with some infos. What do you say ?

I share TBs of webspace with many others...it's called the
Google archives...a better read than a lone individual with an
ego philabusting their "religion", whatever it may be, because
you get the combined opinions, religions, facts and figures of
thousands of people all at the same time...and not in isolation
but in reaction and response to each other...every error
pedantically corrected, every gap meticulously filled...this is
"open source" development of the very human condition...why on
Earth would anyone be interested in just hearing me spout
nonsense on some website, when they can get all that right here
plus much, much more from hundreds of others too (and all the
interaction between everyone thrown in as a "bonus" ;)?

Only the "moment" is transient (but, then, it always is, isn't
it?), it's all on the archives...in a sense, would you really
prefer a bland, dry, static, formal "article" that I take all
the life out of so that it can be statically published? Or would
you rather the constant dynamic tete-a-tete with personalised
and customised responses from not only myself but plenty of
knowledge others (who will surely jump on my "technical
inaccuracy" that the information you're getting is being
developed "open source" - under a "thousand eyes" at least - to
weed out all the bugs and misleading bits from what is said...a
static website is never so graced with this many "bug reports"
or gets updated fast enough...here, we all automatically do this
stuff without even thinking or blinking ;)?

> I think I even can set up such a forum for you if you like.

Oh, the paragon, the acme, the pinnacle of self-absorbed ego: my
own forum...I'll save everyone some time and effort: It may get,
ooh, 15 or 20 responses at first...then down to 10...then down
to 5...then even you'll get bored and wander off...so will
I...empty forum sitting around doing nothing...I mean,
"alt.os.assembly" is an obscure enough topic but "no topic at
all, just want to have my own forum"? Purpose must proceed that
pride...it's got to be "for" something or why on Earth would
anyone go there?

> I have the rights and tools to do so, but havent bothered with
it, so
> it just lies there, dead and unused. It might as well do
something useful.

Yes; But it's yours...it should be for you...really, if I wanted
that kind of thing, then I could arrange it myself...I've
already had such offers more than once and I did once start up a
website for programming stuff but then the hosts just
vanished...I mean, what would be there? Articles, a forum and
some kind of blog thingy? Well, what's the difference between
that and here? Except that here you get everyone else's
articles, a much larger, livelier forum and the weirdest kind of
"merged" blog from everyone talking back and forth...feel free
to disagree, but isn't this already the more superior method? A
stroll through the Google archives is like a thousand websites
all rolled into one and has that extra "interaction" level which
adds on stratas no website ever has...

> You could have your own little forum for feedback of your
work.

And I would _want_ to have this? A _focussed_ arena for every
pedant to attack _only_ me and what I may have done? No, thank
you..."safety in numbers", as they say ;)...

> With no COMMERSIALS whatsoever.

Well, you at least have been paying attention to get that most
important ingredient right...

Mind you, to stress, I'm not opposed to advertising in
itself...I'm opposed to all the constant, incessent, attacking,
_INVASIVE_ advertising in the modern world...not what it is -
that's reasonable - but the usual way it is rammed down people's
throats with contempt...now _that_ I don't dig at all...

> Even if none of
> your work is complete, it should be very interessting to see
it, as it
> must be very clever for sure.

Don't forget to account for my "artistic" temperament...no "work
in progress" peeks...just sit there and model...you'll see the
portrait when it's finished and not before..."impact" and
"presentation" are all part of art - that business of "show" -
even if not on the Richter scale of science ;)...

> Maybe you allready solved a few of the
> things that are ahead of me, and maybe I like to borrow a
little.

I doubt it; And it would be cheating you to steal away the joy
of discovering things for yourself ;)...

> Could you do that ? It would be extremly exciting.

_Could_ I do it? Yeah, I probably _could_..._will_ I? Aah, well,
that question has a slightly different long-winded reply that
I'll spare you, as you can guess the content...

You find this "extremely exciting"? Oh dear, oh dear...you
_really_ need to get out some more, you know? ;)...

Beth :)