Re: Extracting patterned filenames from [glob] without a loop - possible?

From: Bryan Oakley (bryan_at_bitmover.com)
Date: 12/31/03


Date: Tue, 30 Dec 2003 23:53:43 GMT

Phil Powell wrote:

> Consider this code:
>
> set fileList [glob -nocomplain -- /my/directory/*.*]
>
> if {$willArchiveTransferredFiles && [string length $fileList] > 0} {
> # FILTER OUT TO ONLY HAVE FILES ENDING IN _[timestamp] AS THESE ARE
> GENERATED BACKUP FILES BY transfer.tcl TO BE ARCHIVED
> set blah [regsub -all
> {((/[a-zA-Z0-9\-_\.]+)*[a-zA-Z0-9\-\._%~\+]+_[0-9]+\.[a-zA-Z0-9]+)}
> $fileList "\\1 " fileList2]
> puts "blah = $blah"
> set fileList2 [string trim $fileList2]
> puts "fileList2: $fileList2"
> }
>
> The [regsub] does not work and I am absolutely unclear as to how to
> obtain the correct pattern to make it work.

A couple obvious problems relate to the fact you are using string
operations (string length, regsub) on what is obviously a list
($fileList). This is relatively safe since lists can easily be converted
to strings but it's generally very bad to mix strings and lists this
way. Plus, you may get strings with extra quoting characters that you
probably don't expect. You'll almost definitely have a problem if any
of your filenames have spaces in them.

When you try to convert the string back into a list, or just use it as a
list putting faith in the implicit conversion as I suspect you would
likely do, you're asking for trouble.

>
> Bottom line is that I want to get every file name from $fileList that
> has this kind of pattern:
>
> /dir1/dir2/.../dirN/filename_1918272728722.[some extension]
>
> The filename will always contain the following:
> 1) directory
> 2) alphanumeric filename
> 3) an underscore separator
> 4) timestamp
> 5) .[ext]
>
> Thus far I am unsuccessful at extracting this exact pattern from
> $fileList. I am trying very hard to avoid using a loop since the
> number of files extracted from [glob] into $fileList could be in the
> hundreds and that will horrifically clog processing time on the
> production box by doing this.

This doesn't make sense. Tcl could easily loop through thousands of
filenames without 'horrifcally clogging' a system. In fact, it should be
able to process tens of thousands of filenames in under a second. Why do
you think a loop will clog the system only processing a few hundred
files? Are you working on a tiny system (say, a PDA or embedded
processor) or have some other constraint you haven't told us about?

>
> I want to ensure that this code is as streamlined and clean as well as
> fully functional as it should be, but for now it fails and I am unsure
> as to how to fix it. Suggestions welcome.

Use a loop. A naive implementation I just wrote sped through a list of
50,000 filenames in about 800 milliseconds. Admittedly I have a fast
machine, but your parameters seem to imply you need to process only "in
the hundreds". Even on a slow box I'd guess processing several hundred
files would take just a fraction of a second.



Relevant Pages

  • SED deleting tail of string
    ... I need to use sed to delete the rest of a string from a ... I am generating a list of files using the tree command utility ... command lists symbolic links with a -> ala ls. ... the pattern not from the pattern to the end of the line. ...
    (comp.os.linux.misc)
  • Re: Extracting patterned filenames from [glob] without a loop - possible?
    ... my necessary pattern, I do the exact opposite: I use [regsub] to strip ... Probably very ugly considering I'm using string functions on ... This is relatively safe since lists can easily be converted ...
    (comp.lang.tcl)
  • Re: Invalid Argument while Opening file
    ... Note that @temp also contains filenames. ... more backslashes than you need: ... You should not capture if you are not going to use the string ... Note that this pattern is not the same as the pattern in the while... ...
    (comp.lang.perl.misc)
  • TOC of Python Cookbook now online (was Re: author index for Python Cookbook 2?)
    ... Processing a String One Character at a Time ... Finding a File on the Python Search Path ... Constructing Lists with List Comprehensions ... Looping over Items and Their Indices in a Sequence ...
    (comp.lang.python)
  • ANN: MeObjects Library for Delphi
    ... object type small and powerful. ... the Object instance can use the ClassType method return the ... Especially for lists of pointers to dynamically allocated memory. ... {Summary Adds Ansi String and correspondent object to a list. ...
    (borland.public.delphi.thirdpartytools.general)