TIP #189: Tcl Modules

From: Andreas Kupries (akupries_at_shaw.ca)
Date: 04/27/04


Date: Tue, 27 Apr 2004 21:19:59 +0000 (UTC)


 TIP #189: TCL MODULES
=======================
 Version: $Revision: 1.2 $
 Author: Andreas Kupries <akupries_at_shaw.ca>
               Jean-Claude Wippler <jcw_at_equi4.com>
               Jeff Hobbs <jeffh_at_activestate.com>
               Don Porter <dgp_at_users.sourceforge.net>
 State: Draft
 Type: Project
 Tcl-Version: 8.5
 Vote: Pending
 Created: Wednesday, 24 March 2004
 URL: http://purl.org/tcl/tip/189.html
 WebEdit: http://purl.org/tcl/tip/edit/189
 Post-History:

-------------------------------------------------------------------------

 ABSTRACT
==========

 This document describes a new mechanism for the handling of packages by
 the Tcl Core which differs from the existing system in important
 details and makes different trade-offs with regard to flexibility of
 package declarations and to access to the filesystem. This mechanism is
 called "Tcl Modules".

 BACKGROUND AND MOTIVATION
===========================

 The current mechanism for locating and loading packages employed by the
 Tcl core is very flexible, but suffers from a number of drawbacks as
 well. These are at least partially the result of the flexibility, and
 thus not easily solved without giving up something.

 One problem with the current mechanism is that it extensively searches
 the filesystem for packages, and that it has to actually read a file
 (/pkgIndex.tcl/) to get the full information for a prospective package.
 All of these operations which take time. The fact that "index scripts"
 are able to extend the list of paths searched tends to heighten this
 cost as it force rescans of the filesystem. Installations the where
 directories in the /auto_path/ are large or mounted from remote hosts
 are hit especially hard by this (Network delays). All of this together
 causes a slow startup of tclsh and Tcl-based applications.

 "*Tcl Modules*" on the other hand is designed with less flexibility in
 mind and to allow implementations to glean as much information as
 possible without having to perform lots of accesses to the filesystem.

 Additional benefits of the proposed design are a simplified deployment
 of packages, akin to the way starkits made application deployment
 simple, and from that an easier implementation and management of
 repositories.

 It does not come without penalties however.

     * The simplified design has no "index scripts". While this does
       away with extending the list of paths to search it also does away
       with the ability of packages to check preconditions, like the
       version of the currently executing Tcl interpreter. Dependencies
       of packages in module form on particular versions of Tcl have to
       be managed differently and outside of them.

     * "Tcl Modules" is defined to be an extension of the existing
       package mechanism and does /not/ replace it. This means that any
       failure to find a package as module /has to/ cause a fall back to
       the regular package mechanism. It also sets a limit on how much
       of our goals we can reach: searching for packages which are not
       installed will stay relatively slow, and dominated by the
       filesystem scan of the regular search. This implies that "Tcl
       Modules" will be best suited in installations where the number of
       regular packages is low, and contained in a small part of the
       overall filesystem.

       On the gripping hand the only regular packages required will be
       packages supporting the virtual filesystems employed by modules
       (more on that later), so a transformation of a installation based
       on a set of regular packages to the form above is quite feasible.

 SPECIFICATION
===============

 INTRODUCTION
--------------

 Modules are regular Tcl Packages, in a different guise. To ease
 explanations first a summary of the existing mechanism:

     * Packages are identified through "pkgIndex.tcl" files and the
       "index script" they contain. These files are read and define the
       "provide script", which tells Tcl how to actually load the
       package. In other words, whether to use the "source" or "load"
       command, which file to specify as an argument to that command,
       etc. However as "pkgIndex.tcl" contains a regular tcl script it
       can do more than that and actually influence the environment,
       i.e. the package search itself, in several ways:

            * It may choose to not register the package if conditions
              for the package are not met, like being run by a too old
              version of Tcl.

            * It may extend the list of paths used to search for
              packages. This implies that a package is able to modify
              the behaviour of the package search (usually extend the
              search) even before it is loaded, and even if it will not
              be loaded at all.

 The above is very flexible, but comes at a price. The filesystem is not
 only searched, but files have to be read as well to build up the
 in-memory index of packages. And this is iterated as well if index
 files change/extend the list of paths to search.

 Tcl Modules simplifies the above considerably, by cutting down on the
 number of indirections involved. It only searches for module files, and
 records their location, but does not read them. The search is only
 performed when required, on a limited part of the filesystem. This
 makes locating and importing packages in module form easier and faster.
 The price is that packages in module form cannot prevent registration
 in an interpreter not of their choice, nor can they influence the
 package search itself before they are actually used.

 The remainder of this document will cover the following topics

     * What constitutes a Tcl Module ?

     * How are they found ?

     * How are they indexed, i.e. entered into the package database ?

 MODULE DEFINITION
-------------------

 A Tcl Module is a Tcl Package contained in a /single/ file. This file
 has to be *source*able. In other words, a Tcl Module is always imported
 via:

    source module_file

 The "load" command is not directly used. This restriction is not an
 actual limitation, as we may believe. Ever since 8.4 the Tcl *source*
 command reads only until the first ^Z character. This allows us to
 combine an arbitrary Tcl script with arbitrary binary data into one
 file, where the script processes the attached data in any it chooses to
 fully import and activate the package. Please read [TIP #190]
 "Implementation Choices for Tcl Modules" for more explanations of the
 various choices which are possible.

 The name of a module file has to match the regular expression

    ([[:alpha:]][:[:alnum:]]*)-([[:digit:]].*)\.tm

 The first capturing parentheses provides the name of the package, the
 second clause its version. In addition to matching the pattern the
 extracted version number must not raise an error when used in the
 command

    package vcompare $version 0

 This additional check has several benefits. The reg-exp pattern is a
 bit simpler, and the full version check is based on the official
 definition of version numbers used by the Tcl core itself.

 FINDING MODULES
-----------------

 Remember the check for a valid module in last section, and notice that
 any filename matching this name pattern is going to be treated by the
 TM system as if it's a Tcl module, whether it really is or not. This
 means it's a bad idea for any non-Tcl module files that might match
 that pattern to end up in a directory where TM will be scanning. This
 suggests that the directory tree for storing Tcl modules ought to be
 something separate from other parts of the filesystem. This further
 implies that a new search path over just these separate storage areas
 would be better than Yet Another use of /$::auto_path/.

 Therefore: Modules are searched for in all directories listed in the
 result of the command "::tcl::tm::path list" (See also section 'API to
 "Tcl Modules"'). This is called the "Module path". Neither
 "/auto_path/" nor "/tcl_pkgPath/" are used.

 All directories on the module path have to obey one restriction:

     * For any two directories neither is an ancestor directory of the
       other.

 This is required to avoid ambiguities in package naming. If for example
 the two directories

    foo/
    foo/cool

 were on the path a package named 'cool::ice' could be found via the
 names 'cool::ice' or 'ice', the latter potentially obscuring a package
 named 'ice', unqualified.

 Before the search is started the name of the requested package is
 translated into a partial path, using the following algorithm:

     * All occurrences of '::' in the package name are replaced by the
       appropriate directory separator character for the platform we are
       on. For Unix for example this is '/'.

 Example:

     * The requested package is /encoding::base64/. The generated
       partial path is

    encoding/base64

 After this translation the package is looked for in all module paths,
 by combining them one-by-one, first to last with the partial path to
 form a complete search pattern. The exact pattern and mechanism is left
 unspecified, giving the implementation freedom of choice what glob
 searches to perform, how much of them, and when.

 Independent of that the implemented algorithm has to reject all files
 where the filename does not match the regular expression given in the
 previous section. For the remaining files "provide scripts" are
 generated and added to the *package ifneeded* database.

 The algorithm has to fall back to the previous unknown handler when
 none of the found module files satisfied the request. If the request
 was satisfied no fall-back is required.

 PROVIDE AND INDEX SCRIPTS
---------------------------

 Packages in module form have no control over the "index" and "provide
 script"s entered into the package database for them. For a module file
 /MF/ the "index script" is

    package ifneeded PNAME PVERSION [list source MF]

 and the "provide script" embedded in the above is

    source MF

 Both package name *PNAME* and package version *PVERSION* are extracted
 from the filename *MF* according to the definition below:

    MF = /module_path/PNAME'-PVERSION.tm

 Where *PNAME' *is the partial path of the module as defined in section
 'Finding Modules' before, and translated into *PNAME* by changing all
 directory separators to '::', and *module_path* the path from the list
 of paths to search we found the module file under.

 /Note/ that we are here creating a connection between package names and
 paths. Tcl is case-sensitive when it comes to comparing package names,
 but there are filesystems which are not, like NTFS. Luckily these
 filesystems do store the case of the name, despite not using the
 information when comparing.

 Given the above we allow the names for packages in Tcl modules to have
 mixed-case, but also require that there are no collisions when
 comparing names in a case-insensitive manner. In other words, if a
 package 'Foo' is deployed in the form of a Tcl Module, packages like
 'foo', 'fOo', etc. are not allowed anymore.

 Regular packages have no problem with the names of their files as their
 entry point is has a standard name ("/pkgIndex.tcl/") and its contents
 can be adjusted according to the filesystem they are stored in.

 API TO "TCL MODULES"
----------------------

 "Tcl Modules" is implemented in Tcl, as a new handler command for
 *package unknown*. This command calls the previously installed handler
 when its own search fails, thereby ensuring proper fall-back to the
 regular package search.

 All code and data structures implementing "Tcl Modules" reside in the
 namespace "/::tcl::tm/".

 A namespace variable holds the list of paths to search for modules, but
 is not officially exported. All access to this variable is done through
 the following public commands:

     * *::tcl::tm::path add* /PATH/

       The path is added at the head to the list of module paths.

       The command enforces the restriction that no path may be an
       ancestor directory of any other path on the list. If the new path
       violates this restriction an error will be raised.

       If the path is already present as is no error will be raised and
       no action will be taken.

       Paths are searched in the order of their appearance in the list.
       As they are added to the front of the list they are searched in
       reverse order of addition. In other words, the paths added last
       are looked at first.

     * *::tcl::tm::path remove* /PATH/

       Removes the path from the list of module paths. The command is
       silently ignored if the path is not on the list.

     * *::tcl::tm::path list*

       Returns a list containing all registered module paths, in the
       order that they are searched for modules.

 We do /not/ provide APIs for rescanning directories, clearing internal
 state and such. The official interface to this functionality is
 "package forget" and special interfaces are neither required nor
 desirable.

 DISCUSSION
============

 RESTRICTION TO "SOURCE"
-------------------------

 This has already been discussed in the specification above.

 For more discussion I again refer to [TIP #190] "Implementation Choices
 for Tcl Modules" which explains the various implementation choices in
 much more detail.

 PRECONDITIONS
---------------

 It has already been mentioned in section 'Background and Motivation'
 that preconditions in "index scripts" are lost, one of the penalties of
 the simplified scheme specified here.

 Their existence was most important to installations with multiple
 versions of Tcl coexisting with each other as they could share the
 directory hierarchy containing packages between the various Tcl cores.
 This is not possible anymore, at least not in a simple manner.

 For the majority of installations however, i.e. those without only one
 version of Tcl installed, or controlled environments like the inside of
 starkits and starpacks, this loss is irrelevant and of no consequence.

 For more discussion please see [TIP #191] "Managing Tcl Package and
 Modules in a Multi-Version Environment" which explains the various
 choices a sysadmin has in much more detail.

 PACKAGE METADATA
------------------

 An area possibly made harder by Tcl Modules is the storage and query of
 package metadata. [TIP #59] was one way of handling such information,
 by storing them in the binary library of packages which have such.
 Another approach was to store them in the package index script, using a
 hypothetical *package about* command.

 The latter approach has the definite advantage that it was possible to
 query the database of metadata for a particular package without having
 to actually load said package, as a load may fail if the Tcl shell used
 to query the database does not fulfil the preconditions for that
 package.

 Both approaches listed above assume that it makes sense to query the
 database of metadata for all installed packages from a plain Tcl shell.
 In other words, to use the standard Tcl shell also as the tool to
 directly manage an installation.

 It is possible to extend the proposal made in this document to handle
 metadata as well. We already reserved the namespace *::tcl::tm* for use
 by us, so it is no big problem to extend the public API with commands
 to locate all installed packages, their metadata, and to perform
 queries based on this. This will require an additional specification
 how metadata is stored in/by Tcl Modules, and it will have to be
 understood that these extended management operations can take
 considerably more time than a *package require*, as they will have to
 scan all defined search paths and all their sub directories for Tcl
 Modules, and have to extract the metadata itself as well.

 DEPLOYMENT
------------

 The fact that a Tcl Module consists only of a single file makes its
 deployment quite easy. We only have to ensure correct placement in one
 of the searched directories when installing it locally, but nothing
 more.

 Regarding the usage of Tcl Modules in a wrapped application please see
 [TIP #190] "Implementation Choices for Tcl Modules". This is highly
 dependent on the implementation chosen for a specific Tcl Module and
 thus not discussed here, but in the referred document.

 PACKAGE REPOSITORIES
----------------------

 At a very basic level, the physical storage, any directory tree
 containing properly placed files for a number of modules can serve as a
 package repository for the modules in it. In other words, from that
 point of view an installation is virtually indistinguishable from a
 repository, and their creation and maintenance is very easy

 Note however that the higher levels of a repository, like indexing
 package metadata in general, or dependence tracking in particular,
 licensing, documentation, etc. are not addressed here and by this.

 This requires standards for package metadata, format and content, this
 document will not deal with.

 DEFAULTS
----------

 The default list of paths on the module path is computed by a tclsh as
 follows, where /X/ is the major version of the Tcl interpreter and /y/
 is less than or equal to the minor version of the Tcl interpreter.

     * System specific paths

            * *file normalize* [*info library*]/../tcl/X///X/./y/

              In other words, the interpreter will look into a directory
              specified by its major version and whose minor versions
              are less than or equal to the minor version of the
              interpreter.

              Example: For Tcl 8.4 the paths searched are

                   * [*info library*]/../tcl8/8.4

                   * [*info library*]/../tcl8/8.3

                   * [*info library*]/../tcl8/8.2

                   * [*info library*]/../tcl8/8.1

                   * [*info library*]/../tcl8/8.0

              This definition assumes that a package defined for Tcl
              /X.y/ can also be used by all interpreters which have the
              same major number /X/ and a minor number greater than /y/.

            * *file normalize* /EXEC//tcl/X///X/./y/

              Where /EXEC/ is [*file normalize* [*info
              nameofexecutable*]/../lib] or [*file normalize*
              [*::tcl::pkgconfig get* libdir,runtime]]

              This sets of paths is handled equivalently to the set
              coming before, except that it is anchored in
              /EXEC_PREFIX/. For a build with /PREFIX/ = /EXEC_PREFIX/
              the two sets are identical.

     * Site specific paths.

            * *file normalize* [*info library*]/../tcl/X//site-tcl

     * User specific paths.

            * *$::env*(TCL/X/./y/_TM_PATH)

              A list of paths, separated by either *:* (Unix) or *;*
              (Windows). This is user and site specific as this
              environment variable can be not only by the users profile,
              but by system configuration scripts as well.

              These paths are seen and therefore shared by all Tcl
              shells in the *$::env*(PATH) of the user.

              Note that /X/ and /y/ follow the general rules set out
              above. In other words, Tcl 8.4 for example will look at
              these 5 environment variables

                   * *$::env*(TCL8.4_TM_PATH)

                   * *$::env*(TCL8.3_TM_PATH)

                   * *$::env*(TCL8.2_TM_PATH)

                   * *$::env*(TCL8.1_TM_PATH)

                   * *$::env*(TCL8.0_TM_PATH)

 /All/ the default paths are added to the module path, even those paths
 which do not exist. Non-existent paths are filtered out during actual
 searches. This enables a user to create one of the paths searched when
 needed and all running applications will automatically pick up any
 modules placed in them.

 The paths are added in the order as they are listed above, and for
 lists of paths defined by an environment variable in the order they are
 found in the variable.

 INSTALLATION
--------------

 The installation of a Tcl module for a particular interpreter is
 basically done like this:

    #! /path/to/chosen/tclsh
    # First argument is the name of the module.
    # Second argument is the base filename
    set mpaths [::tcl::tm::path list]
    ... remove all paths the user has no write permissions for.
    ... throw an error if there are no paths left.
    ... provide the user with some UI if more than one path is left
    ... so that she can select the path to use.
    set selmpath [ui_select $mpaths]
    file copy [lindex $argv 1] \
        [file join $selmpath \
        [file dirname [string map {:: /} \
        [lindex $argv 0]]]]

 GLOSSARY
==========

 The following terms and definitions are used throughout the document

     * /index script/

       A script used to index a package, or not. Usually contained in a
       file named "/pkgIndex.tcl/". Can check preconditions for a
       package and contains package specific code for setting up the
       package specific /provide script/.

     * /provide script/

       This is a package specific script and tells Tcl exactly how to
       import it. In the existing package system it is generated and
       registered by the /index script/. Tcl Modules on the other hand
       generates it based on information gleaned from filenames.

 REFERENCE IMPLEMENTATION
==========================

 A reference implementation is available in Patch 942881
 [<URL:http://sf.net/tracker/?func=detail&aid=942881&group_id=10894&atid=310894>]

 QUESTIONS
===========

 COMMENTS
==========

 [ Add comments on the document here ]

 COPYRIGHT
===========

 This document has been placed in the public domain.

-------------------------------------------------------------------------

 TIP AutoGenerator - written by Donal K. Fellows

[[Send Tcl/Tk announcements to tcl-announce@mitchell.org
  Announcements archived at http://groups.yahoo.com/group/tcl_announce/
  Send administrivia to tcl-announce-request@mitchell.org
  Tcl/Tk at http://tcl.tk/ ]]



Relevant Pages