Re: Workable encryption in Tcl??

From: R. T. Wurth (rwurth_at_att.net)
Date: 08/05/04


Date: Thu, 05 Aug 2004 01:53:22 GMT

In article <%y6Qc.84$0c.30@read1.cgocable.net>, snowzone5@hotmail.com
wrote:
> On Sat, 31 Jul 2004 at 00:46 GMT, Geoff Caplan <geoff@variosoft.com> wrote:
>
> > Run into a tiresome problem right at the start of my project.
>
> [tcl uncryption journey deleted :) ]
>
> you're pretty much where i was at a year ago. this being tcl i was
> expecting to be able to do something like:
>
> package require tcl_encryption
>
>
> set key [generate random key]
>
> set encrypted [encrypt $data, $key]
> set decrypted [decrypt $encrypted, $key]
>
>
> something simple like the above with twofish encryption AND available in
> tcllib would be just about perfect :)
>

I think it would be nice if pigs could fly, but I don't think that's
going to happen either.

The problem is a disconnect between the abstract and concrete
worlds. A Unicode-aware program, like TCL deals with the abstract
concept of characters. The outside world deals with concrete items,
namely data octets. Now, in fact, we know that TCL stores these
abstract characters using the concrete UTF-8 encoding, but users
should not concern themselves about this or rely on it. Usually,
the distinction between the internal abstract concept of Unicode
characters and the external representation in the form of concrete
data octets is invisible to the user because the system
automatically uses a default system encoding to map between
character streams and octet streams when doing input and output. In
the rare cases when the programmer knows the system encoding does
the wrong thing, the programmer can set the channel for a specific
encoding, or set the channel to be transparent (-encoding binary
is the relevant option, I think). A program using a transparent
channel takes on a heavy responsibility, namely the responsibility
of translating or interpreting the data. Tcl provides some helpful
commands, notably [binary scan], [binary format],
[encoding convertfrom], and [encoding convertto].

How does this relate to encryption? Very simply, even though
encryption is in the TCL realm, the underlying algorithms are
defined as operating on (concrete) bit streams or octet streams, not
on (abstract) Unicode character streams. So, to encrypt a character
stream, the user must first decide what encoding should be used to
convert the abstract characters into concrete bit patterns, and then
apply the encryption algorithm. Which encoding should be used? I
could cavalierly say that as a US-based programmer, who, true to
stereotype, never learned a foreign language (unless you count 2
years in High School half-heartedly studying Latin), I don't care,
because all the encodings correctly translate the standard American
English characters (\u0020 through \u007e) into 7-bit USASCII (an
obsolete standard that mostly conforms to an equivalent ISO standard
whose number I forget). However, if you want to be
internationalized, you have to consider what encoding the recipient
of your data stream wants to see after they have decrypted it. If
you know the recipient is another TCL application that you can
specify, the clear choice would be UTF-8, because it is able to
represent every character in 16-bit Unicode, whereas many other
encodings map a large part of the character set into the single
character '?'. On the other hand, if the recipient is the Notepad
application an IBM-compatible PC running some flavor of MS-Windows,
you probably want one of the IBM CP-xxxx encodings.

A similar issue arises with respect to keys. Encryption algorithms
deal with keys as a stream of bits or octets, while Tcl deals with
characters. So, the issue becomes one of how does one collect the
key from the user and map it into a bit or octet stream? For DES,
which takes key data organized as octet units comprising 7 data bits
and a parity bit, it might make sense to require the user to simply
enter USASCII characters only, or perhaps to enter the key as a
string of hex digits.

So, to me, a simplified interface to a library of low-level encryption
functions might be
  encrypt <<algorithm_selector>> ?-encoding <<encoding_selector>>?
      ?-keyencoding <<encoding_selector>>? ?--? <<key>> <<data>>

Where:
  ? ... ? indicates optional arguments
  << ... >> indicates data as noted below
  everything else taken literally

  <<algorithm_selector>> is, for example DES-CBC, DES-ECB, 3DES,
                              2fish, etc.
  <<encoding_selector>> determines the encoding used to map
                              characters into data octets prior to
                              encryption. It would include the
                              usual suspects (see Tcl's man pages
                              for I/O operations, the [binary]
                              command, and the [encoding] command).
                              For the case of a -keyencoding,
                              'hexstring' and 'hexpairs' would
                              represent formats like
                              a3b5c7d9e0f1...0011, and "a3 b5 c7 d9
                              e0 f1 ... 00 11", respectively. These
                              would be optional, and the defaults
                              would be the system default encoding.
  -- is an optional signal for the end of
                              option arguments, and is useful if the
                              key might be mistaken for an option.
  <<key>> is the key (as a character string,
                              unless -keyencoding binary is specified).

  <<data>> is the data to be encrypted, (as a
                              character string, unless -encoding
                              binary is specified).
A similar decrypt function would be supplied.

Note that the cyphertext output from encrypt (cyphertext input to
decrypt), must necessarily be a "binary" string, not a type that Tcl
handles all that well. In fact, about the only things I can think
of that make sense is to do some I/O operation through a file,
socket, or channel configured as binary, or to convert them through
something like base-64 encoding or hex encoding that results in
regular ASCII characters, but of course such operations would have
the [binary] command at their heart. Don't even think of coming
anywhere within 10 meters of them with a string or list operator.

I think earlier articles in this thread have given enough
information to show how a proc with this interface could be built
over the DES primitives. The implementation is left as an exercise
for the reader. :-)

-- 
 Rich Wurth / rwurth@att.net / Rumson, NJ  USA


Relevant Pages

  • Re: Workable encryption in Tcl??
    ... like TCL deals with the abstract ... > abstract characters using the concrete UTF-8 encoding, ... > character streams and octet streams when doing input and output. ...
    (comp.lang.tcl)
  • Re: Does Base64 encoding before encryption makes it easier to break?
    ... ]file/stream before applying symmetric encryption, ... made up of printable characters only. ... ]is not quite sure how the base 64 encoding would affect the security ... ]Goh, Yong-Kwang ...
    (sci.crypt)
  • Re: Strange Characters When Viewing Outlook Express messages
    ... Messages Received in Outlook Express Have Different Characters in the ... messages in the default encoding format regardless of the actual encoding ... changed something with whatever they use to produce the emails. ...
    (microsoft.public.windowsxp.general)
  • Re: Help me!! Why java is so popular
    ... Well, Unicode is not a storage encoding system, or anything like that. ... Unicode is primarily a mapping from characters (in the linguistic conceptual ... French, Russian, Japanese and Korean songs. ...
    (comp.lang.java.programmer)
  • Re: Trasferire file
    ... The Base64 Content-Transfer-Encoding is designed to ... The encoding and decoding algorithms ... as output strings of 4 encoded characters. ... that this may be done directly by the encoder rather than in ...
    (it.comp.macintosh)