Re: OT: Unicode and vi(m). Was Re: Great SWT Program



In article <1189827631.738245.304280@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
<nebulous99@xxxxxxxxx> wrote:
On Sep 13, 7:30 am, RedGrittyBrick <redgrittybr...@xxxxxxxxxxxxx>
wrote:
digraph hu
digraph w:
^VUFC12

I guess this means that either

1) Vim is *not* one of the "Console-mode archaisms from the 70s".

or

2) "Console-mode archaisms from the 70s simply cannot and will not ever
decently support unicode" is wrong.

The key word here is "decently". These all look correct to me in
Google Groups. (I won't vouch for their not being mangled when GG
sends this followup though.) But you won't be able to see them
properly when editing in text-mode, unless you switch to a charset
with the right glyphs. Which means either you're hand-hacking stuff
like "^VUFC12" or the *English* text nearby is unreadable...

Well, I was curious ....

Running console-mode vim (not gvim) in GNOME's terminal emulator
program on my Linux system, I saw all the characters in the post
by RedGrittyBrick. Not knowing all the relevant alphabets,
I can't be sure they appeared correctly, though they all looked
at least plausible. No idea how they look in Google Groups;
I guess I don't care enough to find out whether a search would
find RGB's post.

After spending a few minutes reading online help about digraphs,
I was able to enter the characters he mentions as being enterable
with digraphs, and there's a table readily accessible showing all
the ones that are currently available. There's also a mention of
"keymaps" that sounds like it could be useful.

Hm, this is getting interesting ....

And I don't consider having to edit blind to be "decent" support for
whatever it is I'm having to edit blind.

For composing a post like the one I'm following up to, anything
limited to displaying one code page at a time is crippling. And
anything that isn't so limited is obviously not (whatever its
ancestry) what was originally under discussion.

Well, I believe what was originally under discussion was the
combination of vim and trn, and now we know that vim probably
can do what's needed. (For those who care about such things,
I did initially specify vim and not vi.)

trn I'm not so sure about -- I'm guessing it will probably transmit
whatever file is produced by the text editor used to compose posts,
but it probably won't add appropriate headers. I'm guessing that
"appropriate headers" here means something along the lines of one
of the following, culled from my current archive of posts to cljp:

Content-Type: text/plain; charset="iso-8859-1"
Content-Type: text/plain; charset=UTF-8; format=flowed

Hm, I wonder if such headers could be added manually ....

Well, after doing some experiments in misc.test, it appears
that one can, and that with or without them it's possible to
post something containing characters other than 7-bit ASCII and
have them come out okay (as best I can tell anyway). My tools
only get confused when I put in such characters and then add a
Content-Type header specifying us-ascii, which seems utterly
reasonable.

I'm still curious, though, about standards. If I put in Unicode
characters, and add the header

Content-Type: text/plain; charset=UTF-8

does anyone know if this complies with whatever standards exist?
(I'm thinking an RFC somewhere -- I did make a quick attempt to
find out via Google searches, but without success.)

I'll put in some of those Unicode characters here, as another
test ....

A copyright symbol (I hope): ®

An a with an umlaut (I hope): ä

--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
.



Relevant Pages