Re: How can I use tcl to read files written in GBK or GB18030 encoding?



Larry W. Virden wrote:
I know that Tcl has quite a large list of encodings that it supports.
However, I've a request for guidance by someone who needs to read
files using either GBK or GB18030 (I think these are alternate names
for the same encoding...).

Has anyone worked out what one needs to do for this?

If there is documentation for the format of the encoding as a map to
unicode characters, you can do it. The only thing you need to make is
the "compiler" with source for the map compiler located as
/tools/encoding/txt2enc.c

The readme in that directory explains it, and just seems incredibly
simple to do:

"
On Unix, use "make" to compile all the encoding files (*.txt,*.esc)
into the format that Tcl can use (*.enc). It is the caller's
responsibility to move the generated .enc files to the appropriate
place (the $TCL_LIBRARY/encoding directory).
"

I can't describe the format of the mappings infile, but should be
straight from the unicode standard.

Looks to be a simple matter of:
1) get or create the mapping infile
2) build txt2enc.c
3) compile the mapping outfile into a .enc
4) move the .enc to where $TCL_LIBRARY/encoding resides
5) start tclsh and call 'encoding names' and observe your new addition
5) ???
6) make lots of money!

I wouldn't be surprised if the mapping infile exits already on
www.unicode.org somewhere for you to download...

ftp://ftp.unicode.org/Public/MAPPINGS/
.