Re: fconfigure -translation binary conversion
- From: Andreas Leitgeb <avl@xxxxxxxxxxxxxxxxxxxxxxxx>
- Date: 28 Feb 2008 16:55:44 GMT
yahalom <yahalome@xxxxxxxxx> wrote:
PS: for further diagnosis, do the following:
puts [string length $utf8Str]
(right after you got the value from DB)
If the length corresponds to the number of (hebrew)
characters, then you actually have a tcl-internal formatted
string, and you need one step of conversion: either through
encoding, or by *not* setting the channel to binary.
Just one more explanation on this:
utf-8 is *not* the same as unicode
utf-8 is just one of a couple ways to convert
a string of unicode characters to a string of octets(bytes).
There also exist other encodings, but those just happen to be
rarelier used, or specific to certain special demands.
(utf-7, ucs-2 are just two others, but you can just as well
forget about them, instantly :-)
Tcl's strings can be used two ways:
1) they can contain unicode-chars, then their string length corresponds
to the number of characters.
2) it can be a string of octets. As such it can contain the utf-8
encoding of some international text, and except for the
basic characters (latin letters, digits, ...) most characters
are represented by a two or three octet long sequence.
That's why string length will give a bigger result.
Of course, what a certain string variable contains, depends only on
what got stored in it. Your problem now is, where did the strings come from.
if, in your script, you have a line like
set str "...[some hebrew chars]..."
Then, during sourcing the script, the system encoding is used to
convert these strings into "1)"-format, so string length gives the
obvious result.
If your database stores utf-8 values, but the db-interface you use
for tcl doesn't know, it's likely that it will feed "2)"-type data
into your variable.
As you probably can guess by now,
set x [encoding convertto utf-8 $x]
creates a type "2)" string from a type "1)" string, and
set x [encoding convertfrom utf-8 $x]
does it the other way round.
Now print out the string length for the data you get from db,
to be sure which type you actually have, and, if it indeed
looks like type "2)", then just make it type "1)" with convertfrom.
Once it's in type "1)" you can safely combine it with strings from
your script, and in the end, either you simply "puts" it to a
non-binary channel, so type "1)" will be converted to utf-8
transparently, or, you explicitly do the convertto and then
"puts" it to a binary channel.
sorry for hard pressing on this staff but all this encoding really
gets me crazy.
So it got me, about a year ago, and some regulars here were
really patient with me, which is one of the reasons, I'm now
patient myself :-)
.
- References:
- fconfigure -translation binary conversion
- From: yahalom
- Re: fconfigure -translation binary conversion
- From: yahalom
- Re: fconfigure -translation binary conversion
- From: Andreas Leitgeb
- Re: fconfigure -translation binary conversion
- From: yahalom
- Re: fconfigure -translation binary conversion
- From: Andreas Leitgeb
- Re: fconfigure -translation binary conversion
- From: yahalom
- Re: fconfigure -translation binary conversion
- From: Andreas Leitgeb
- Re: fconfigure -translation binary conversion
- From: yahalom
- fconfigure -translation binary conversion
- Prev by Date: Re: need a tiny help with my SWIG'd program
- Next by Date: Re: tclhttpd with utf-8
- Previous by thread: Re: fconfigure -translation binary conversion
- Next by thread: firefox-like Configuration page widget
- Index(es):
Relevant Pages
|