Re: How can I use tcl to read files written in GBK or GB18030 encoding?



Larry W. Virden wrote:
I know that Tcl has quite a large list of encodings that it supports.
However, I've a request for guidance by someone who needs to read
files using either GBK or GB18030 (I think these are alternate names
for the same encoding...).

Has anyone worked out what one needs to do for this?
According to Wikipedia ( http://en.wikipedia.org/wiki/GB_18030 ) its
an encoding mandating support for non BMP characters. As Tcl currently
only supports the BMP your out of luck if you need full compliance.
But you might be able to get support for the BMP part of the encoding.

But you might come up with a limited mapping file to feed to the Tcl
encoding system (like those in the Tcl source dirs, there is some tool
in the tools/ subdir to convert the unicode.org txt files to Tcl enc
files). After that you need to put the enc File in the right places
(or register them later like the various texts for starkits describe
it).

Michael
.



Relevant Pages

  • Re: utf-8 encoding different forms?
    ... I tried the following in tcl 8.5b1: ... [string range $encoded $start ... set encoded [encoding convertfrom $encoding $encoded]; ... attachments data (hex view), etc.. ...
    (comp.lang.tcl)
  • Re: Workable encryption in Tcl??
    ... like TCL deals with the abstract ... > abstract characters using the concrete UTF-8 encoding, ... > character streams and octet streams when doing input and output. ...
    (comp.lang.tcl)
  • Re: utf-8/unicode encoding confusion
    ... According to documentation, TCL is UTF-8 internally. ... encoding two or more times which creates garbage data. ... beyond 0x7f change representation in the conversion. ...
    (comp.lang.tcl)
  • Re: tDOM doesnt support encoding=ASCII?
    ... a Tcl channel then Tcl will ... specifically asked for binary encoding), so any XML encoding declaration ... but when tdom sees it it is almost certainly UTF-8. ...
    (comp.lang.tcl)
  • Re: tclhttpd or Tcl bug
    ... inclusion of the Citrus i18n project, I think with an ASCII encoding, however they have accents, so they are outside the range of ASCII. ... I set up tclhttpd from yesterday's cvs to work over my LAN. ... Now I use Ubuntu Linux, and I find that tclhttpd is failing miserably, but other tools such as lighttpd work just fine. ... So I'm fairly certain it's something in Tcl that is mangling the filenames. ...
    (comp.lang.tcl)