Re: fconfigure -translation binary conversion



yahalom <yahalome@xxxxxxxxx> wrote:
PS: for further diagnosis, do the following:
  puts [string length $utf8Str]
   (right after you got the value from DB)
If the length corresponds to the number of (hebrew)
characters, then you actually have a tcl-internal formatted
string, and you need one step of conversion: either through
encoding, or by *not* setting the channel to binary.

Just one more explanation on this:
utf-8 is *not* the same as unicode
utf-8 is just one of a couple ways to convert
a string of unicode characters to a string of octets(bytes).

There also exist other encodings, but those just happen to be
rarelier used, or specific to certain special demands.
(utf-7, ucs-2 are just two others, but you can just as well
forget about them, instantly :-)

Tcl's strings can be used two ways:
1) they can contain unicode-chars, then their string length corresponds
to the number of characters.
2) it can be a string of octets. As such it can contain the utf-8
encoding of some international text, and except for the
basic characters (latin letters, digits, ...) most characters
are represented by a two or three octet long sequence.
That's why string length will give a bigger result.

Of course, what a certain string variable contains, depends only on
what got stored in it. Your problem now is, where did the strings come from.

if, in your script, you have a line like
set str "...[some hebrew chars]..."
Then, during sourcing the script, the system encoding is used to
convert these strings into "1)"-format, so string length gives the
obvious result.

If your database stores utf-8 values, but the db-interface you use
for tcl doesn't know, it's likely that it will feed "2)"-type data
into your variable.

As you probably can guess by now,
set x [encoding convertto utf-8 $x]
creates a type "2)" string from a type "1)" string, and
set x [encoding convertfrom utf-8 $x]
does it the other way round.

Now print out the string length for the data you get from db,
to be sure which type you actually have, and, if it indeed
looks like type "2)", then just make it type "1)" with convertfrom.
Once it's in type "1)" you can safely combine it with strings from
your script, and in the end, either you simply "puts" it to a
non-binary channel, so type "1)" will be converted to utf-8
transparently, or, you explicitly do the convertto and then
"puts" it to a binary channel.

sorry for hard pressing on this staff but all this encoding really
gets me crazy.

So it got me, about a year ago, and some regulars here were
really patient with me, which is one of the reasons, I'm now
patient myself :-)

.



Relevant Pages

  • Re: DBD::ODBC and character sets
    ... whether UTF-8 encoded data is in the script or not as in my examples (as ... DBD::ODBC) use \xin which case use encoding does not come in to ... as a string literal containg sixcharacters encoded as ISO-8859-1 ...
    (perl.dbi.users)
  • Re: DB2 UTF-8 ODBC double conversion
    ... UTF-8 *is* Unicode. ... byte to store characters in the 7-bit ASCII code. ... If I give a UTF-8 string to CreateFile, ... this means that everyone who is using that database has to understand that the ...
    (microsoft.public.vc.mfc)
  • Re: UTF-8 encoding
    ... I need to pass a UTF-8 encoded writer ... reading that file with the system's default encoding. ... String), but used elsewhere as if it were a StringBuffer. ... There's a very good reason that ...
    (comp.lang.java.programmer)
  • Re: DBD::ODBC and character sets
    ... you have and accept UTF-8 encoded data does mean you need to "use ... encoding" but if your script is encoded in xxx you need "use encoding ... Perl sees the left-hand side of eq as a string literal containg sixcharacters encoded as ISO-8859-1 ...
    (perl.dbi.users)
  • Re: Writing Japanese or Chinese strings in a text file
    ... characters on the screen. ... start of the file that flags the data as UTF-8. ... VB uses Unicode internally, for 'String' data in memory. ... So they are right in the excel file. ...
    (microsoft.public.vb.general.discussion)