Re: Extracting patterned filenames from [glob] without a loop - possible?

From: Don Porter (dgp_at_email.nist.gov)
Date: 01/02/04


Date: 2 Jan 2004 17:09:27 GMT

In article <1cdca2a7.0401020829.6d8abf25@posting.google.com>, Phil Powell wrote:
> Bryan, I am going to say this as calmly as I can:
>
> THE [censored] PATTERN IS THE PROBLEM

Can we say something calmly in return?

As we've tried to explain many, many times, you are using the wrong
tool for the job. You have a list of filenames returned from [glob]
and you keep insisting on applying [regsub] to it. [regsub] is
a sophisticated tool for matching/replacing patterns in a *string*
but it is the wrong tool for working with a list.

Imagine someone complaining that hacking at a block of ice with
a screwdriver was not producing nice uniform chunks of ice. So
we suggest that an ice pick would be a better tool, and in return
someone screams back that the [censored] broken screwdriver is
the problem.

That's what this conversation looks like from our perspective.

And no, we won't help you fix your screwdriver. Read on.

> your solution to using a loop is fine except that when I try to match
> according to the original pattern of only looking for files with this
> pattern:
>
> /dir1/dir2/.../dirN/myFileNameWithAnthing_BunchOfNumbersAsUnixTimestamp.ext
>
> It never ever finds that pattern and fails, whereas were I to find
> that NEGATION of that pattern it works perfectly, regsub, loop, glob,
> blah, foo, whatever.

hmmm... ok, that gets away from the list/string issue.

So, now, again, as I've asked at least twice now. Please
describe for us precisely what patterns you are trying to match.
If you can't make a precise description in words, then it is
impossible to create a precise matching pattern in the [regexp]
language.

You originally posted this:

        set fileList [glob -nocomplain -- /my/directory/*.*]

Do you see that evey file name in $fileList will begin with
/my/directory ? So do you see that there is no need whatsoever
to "match" against something arbitrary like "/dir1/dir2/.../dirN" ?

So that leaves just matching the file name. Your latest description
of the file name pattern you want is:

        "myFileNameWithAnthing_BunchOfNumbersAsUnixTimestamp.ext"

So, I interpret "myFileNameWithAnthing" as any characters at all
are allowed up to the underscore (_). regexp = .*

I interpret "_" as a literal underscore you want in the file name.
regexp = _

Next, I see "BunchOfNumbersAsUnixTimestamp". I'm going to interpret
that as "one or more decimal digits". regexp = \d+

I interpret "." as a literal dot you want in the file name.
regexp = \.

I interpret "ext" as any three characters making up a file name
extension. regexp = ...

I also infer that you want to match the whole filename, so we'll
anchor the pattern to match all the way to the end of the string.
regexp = $

Now if my interpretations are incorrect, you should refine your
descriptions. But if I've interpreted correctly, the combined
regexp for matching an element of your file list is:

        /my/directory/.*_\d+\....$

So, does this get you on the right track? :

    set fileList [glob -nocomplain -- /my/directory/*.*]
    set matchingFiles [list]
    foreach fileName $fileList {
        if {[regexp {/my/directory/.*_\d+\....$} $fileName]} {
            lappend matchingFiles $fileName
        }
    }

As I mentioned before, there are very likely better solutions that
make direct use of the pattern matching capabilities of [glob] itself.

-- 
| Don Porter          Mathematical and Computational Sciences Division |
| donald.porter@nist.gov             Information Technology Laboratory |
| http://math.nist.gov/~DPorter/                                  NIST |
|______________________________________________________________________|


Relevant Pages

  • Re: Knocking Lines Out Of A Multiline String
    ... What's a way to remove lines matching a pattern from a multiline string? ... I would like to remove lines matching /usr/local/lib from the multiline string: ... If you showed your code, an explanation could be added as to your regexp, but the concept certainly works as I've shown. ...
    (comp.lang.ruby)
  • Re: array or with non-array
    ... David A. Black wrote: ... > matching an IO object to a pattern. ... > there *should* be an explicit, intervening string representation. ... This works pretty well for every pattern without anchors. ...
    (comp.lang.ruby)
  • Re: "string match" and "glob" pattern rules
    ... So having none glob-style pattern matching engineen would only be ... something for tcl 9. ... I've often thought that a centralised string matching engine would make ...
    (comp.lang.tcl)
  • Re: Efficient String Lookup?
    ... regexp language allowed embedded Perl code, ... The pattern is ... So the regexp engine tries the next option, ... I could put it inside a * to match all characters, ...
    (comp.lang.python)
  • Re: Regular Expression AND mach
    ... There are, of course, many exceptions, but the pattern is ... > matching techniques and is used by the Glimpse and Webglimpse search ... > There is a Python port of agrep available as a module called 'agrepy' ...
    (comp.lang.python)