Re: Workable encryption in Tcl??

From: Mac A. Cody (maccody_at_castcom.net)
Date: 08/05/04


Date: Thu, 05 Aug 2004 04:00:23 GMT

R. T. Wurth wrote:
8<--------------------- snip! ---------------------------->8
> The problem is a disconnect between the abstract and concrete
> worlds. A Unicode-aware program, like TCL deals with the abstract
> concept of characters. The outside world deals with concrete items,
> namely data octets. Now, in fact, we know that TCL stores these
> abstract characters using the concrete UTF-8 encoding, but users
> should not concern themselves about this or rely on it. Usually,
> the distinction between the internal abstract concept of Unicode
> characters and the external representation in the form of concrete
> data octets is invisible to the user because the system
> automatically uses a default system encoding to map between
> character streams and octet streams when doing input and output. In
> the rare cases when the programmer knows the system encoding does
> the wrong thing, the programmer can set the channel for a specific
> encoding, or set the channel to be transparent (-encoding binary
> is the relevant option, I think). A program using a transparent
> channel takes on a heavy responsibility, namely the responsibility
> of translating or interpreting the data. Tcl provides some helpful
> commands, notably [binary scan], [binary format],
> [encoding convertfrom], and [encoding convertto].
>
> How does this relate to encryption? Very simply, even though
> encryption is in the TCL realm, the underlying algorithms are
> defined as operating on (concrete) bit streams or octet streams, not
> on (abstract) Unicode character streams. So, to encrypt a character
> stream, the user must first decide what encoding should be used to
> convert the abstract characters into concrete bit patterns, and then
> apply the encryption algorithm. Which encoding should be used? I
> could cavalierly say that as a US-based programmer, who, true to
> stereotype, never learned a foreign language (unless you count 2
> years in High School half-heartedly studying Latin), I don't care,
> because all the encodings correctly translate the standard American
> English characters (\u0020 through \u007e) into 7-bit USASCII (an
> obsolete standard that mostly conforms to an equivalent ISO standard
> whose number I forget). However, if you want to be
> internationalized, you have to consider what encoding the recipient
> of your data stream wants to see after they have decrypted it. If
> you know the recipient is another TCL application that you can
> specify, the clear choice would be UTF-8, because it is able to
> represent every character in 16-bit Unicode, whereas many other
> encodings map a large part of the character set into the single
> character '?'. On the other hand, if the recipient is the Notepad
> application an IBM-compatible PC running some flavor of MS-Windows,
> you probably want one of the IBM CP-xxxx encodings.
>
> A similar issue arises with respect to keys. Encryption algorithms
> deal with keys as a stream of bits or octets, while Tcl deals with
> characters. So, the issue becomes one of how does one collect the
> key from the user and map it into a bit or octet stream? For DES,
> which takes key data organized as octet units comprising 7 data bits
> and a parity bit, it might make sense to require the user to simply
> enter USASCII characters only, or perhaps to enter the key as a
> string of hex digits.
>

The last approach appears to be the most direct. Conversion to binary
using the "binary format" command makes it a snap.

Except for the <<encoding_selector>> part, the following is essentially
the interface of TclDES.

> So, to me, a simplified interface to a library of low-level encryption
> functions might be
> encrypt <<algorithm_selector>> ?-encoding <<encoding_selector>>?
> ?-keyencoding <<encoding_selector>>? ?--? <<key>> <<data>>
>
> Where:
> ? ... ? indicates optional arguments
> << ... >> indicates data as noted below
> everything else taken literally
>
> <<algorithm_selector>> is, for example DES-CBC, DES-ECB, 3DES,
> 2fish, etc.
> <<encoding_selector>> determines the encoding used to map
> characters into data octets prior to
> encryption. It would include the
> usual suspects (see Tcl's man pages
> for I/O operations, the [binary]
> command, and the [encoding] command).
> For the case of a -keyencoding,
> 'hexstring' and 'hexpairs' would
> represent formats like
> a3b5c7d9e0f1...0011, and "a3 b5 c7 d9
> e0 f1 ... 00 11", respectively. These
> would be optional, and the defaults
> would be the system default encoding.
> -- is an optional signal for the end of
> option arguments, and is useful if the
> key might be mistaken for an option.
> <<key>> is the key (as a character string,
> unless -keyencoding binary is specified).
>
> <<data>> is the data to be encrypted, (as a
> character string, unless -encoding
> binary is specified).
> A similar decrypt function would be supplied.
> Note that the cyphertext output from encrypt (cyphertext input to
> decrypt), must necessarily be a "binary" string, not a type that Tcl
> handles all that well.

Yes. In fact TclDES has to convert the binary strings into ASCII HEX
prior to any serious bit crunching is done.

> In fact, about the only things I can think
> of that make sense is to do some I/O operation through a file,
> socket, or channel configured as binary, or to convert them through
> something like base-64 encoding or hex encoding that results in
> regular ASCII characters, but of course such operations would have
> the [binary] command at their heart. Don't even think of coming
> anywhere within 10 meters of them with a string or list operator.

Very true.

> I think earlier articles in this thread have given enough
> information to show how a proc with this interface could be built
> over the DES primitives. The implementation is left as an exercise
> for the reader. :-)

Check out http://tcldes.sourceforge.net for the realization of
the proc!

Mac Cody

--
To respond via email, swap "CAST" and "COM" in my email address


Relevant Pages

  • Re: Workable encryption in Tcl??
    ... abstract characters using the concrete UTF-8 encoding, ... character streams and octet streams when doing input and output. ... How does this relate to encryption? ...
    (comp.lang.tcl)
  • Re: Unicode support
    ... >> You can also make the same trick in tcl with encoding convertfrom,but ... I convert string to the form it was in the script file, ... >> easy to write a small script that translates all characters not in ASCII ...
    (comp.lang.tcl)
  • Re: Unicode support
    ... > 2) Characters beyond ASCII in the scripts. ... > character beyond the english characters, the text that tcl will ... you specify the encoding of the script to read. ...
    (comp.lang.tcl)
  • Re: Strange Characters When Viewing Outlook Express messages
    ... Messages Received in Outlook Express Have Different Characters in the ... messages in the default encoding format regardless of the actual encoding ... changed something with whatever they use to produce the emails. ...
    (microsoft.public.windowsxp.general)
  • Re: Help me!! Why java is so popular
    ... Well, Unicode is not a storage encoding system, or anything like that. ... Unicode is primarily a mapping from characters (in the linguistic conceptual ... French, Russian, Japanese and Korean songs. ...
    (comp.lang.java.programmer)