Re: How to structure a perl program to include and exclude files?
From: Kan Yabumoto (tech_at_xxcopy.com)
Date: 21 Jul 2004 23:23:10 -0700
Henry Law <email@example.com> wrote in message news:<firstname.lastname@example.org>...
> I'm implementing this in Perl but I recognise that there's a strong
> element of language independent program design in the question. Hope
> there's enough perlishness to keep me afloat.
> I am writing a Perl program which will process a file tree and allow
> the user to specify which directories and subdirectories are to be
> included or excluded. (Anyone who uses xxcopy in Win will know
> immediately what I mean). I plan to have the users describe the files
> to include and exclude by means of strict Perl regex's. So a control
> file might look something like this
> include /a # Do files in a and all subdirs
> exclude a/b/~?temp\d* # Except for temp files in a/b
> ... and so on. I haven't worked out the full grammar yet (do I allow
> indefinite series of include..exclude..include? I don't know). But
> I'm having more trouble with conceptualising how to write the program
> in Perl. Current idea is to write a recursive function to process all
> the files in a single directory, calling itself for sub-directories.
I think I can give you some advice on this issue since I've
been thinking of this issue many years.
Even though I'm extremely knowledgeable about XXCOPY, I'm not
sure exactly what you are trying to do. Are you trying to create
a perl script so that something similar to XXCOPY can be made
available in Linux (or other) environments?
Currently, XXCOPY's support for inclusion is very limited
(it accepts only variations in the "last name" (e.g.,
/IN:*.mp3 /IN:*.doc /IN:abc*). Other than this exception,
XXCOPY's file-selection mechanisms are all exclusive in nature.
There is good reason for this design. Exclusion specifiers
(in the form of date-range specifications, and filesize-specifications
in addition to file/directory pattern specifications) can all
be treated in an additive manner. As long as the file-selection
parameters (switches in XXCOPY command line) are exclusive
in nature, both the implementation and user-understanding
are very easy. Similar or dissimilar file-selection switches
won't contradict each other. They can overlap (some files
can be excluded for two or more reasons).
On the other hand, if you design a command rules that allow
both the exclusion and the inclusion, you really have to
decide which one will have the precedence over the other
since they are contradictory in nature (not only in the
definition of the command rule, but also for user understanding).
I think it is helpful to verbalize what you are trying to do
into plain English. If you can express what you (the user)
want to do and how you (the programmer) will implement and
document the program actions in plain English with clarity,
you may proceed. But, if you are confused of what you are
trying to achieve, you can't program it regardless of the
language you choose.
Let me go back to how XXCOPY presents its capability with
regard to the inclusion and exclusion. The truth is that
the inclusion feature in XXCOPY is really an exclusion
operation in disguise.
1. If there is no inclusion switch (/IN:...), XXCOPY will
not exclude anything.
xxcopy \src_dir\ ...
This is equivalent to
Which is really
xxcopy \src_dir\ /IN:*
2. If the source specifier contains the lastname pattern,
This is equivalent to
xxcopy \src_dir\ /X:(everything except *.mp3)
3. If the command contains two or more inclusion specifiers
xxcopy \src_dir\ /IN:*.mp3 /IN:*.jpg
This is equivalent to
xxcopy \src_dir\ /X:(everything except *.mp3 and *.jpg)
The above examples illustrate how XXCOPY transforms the
inclusion specifiers into exclusion actions inside.
As a matter of fact, date-specifier, size-specifier and
all other forms of file-selection mechanisms are treated
as exclusionary actions which can easily implemented
as "filters" here and there inside the program. Since
exclusion actions can be applied repeatedly without a
concern to precedence, etc. the implementation is
quite simple and the documentation is also straightforward.
The reason why XXCOPY does not support a simple thing
as a "list of filenames to process" in a text file
is it is really an unrestricted form of inclusion
operations. This may not go well with XXCOPY's one-source,
one-destination view of the file management operations.
In the future, we plan to implement a full inclusion
feature (even an "inclusion list" supplied as a text file)
in XXCOPY. When we do support such a feature, we plan
to resolve the inclusion-exclusion precedence as follows:
1. Gather all inclusion-specifiers (list of files and
directories) at first and define what will be
included (this can even be thought as exclusion
list in reverse).
2. Apply all other (exclusionary) specifiers, next.
This will give the exclusion specifiers the precedence.
Note that the precedence in this context does not mean
which one will be evaluated first. Rather, the last
one to be evaluated will prevail (have the lasting effect).
Therefore, in this case, the exclusion specifiers will
have overriding power to inclusion specifiers.
Here, I think the rules are clear. When the exclusion
and inclusion are mixed, unless you simplify the way
they are treated, the user will be totally confused
and you, the designer will be confused and you will not
have a working program whose behaviors will make sense
I'm not necessarily providing this idea as an advice
to make a product for sale which requires a formal
documentation. Even if this project is for your own
personal usage, you as a programmer and you as the
user have to come to a clear understanding. When you
start talking about "recursion" in the design of
inclusion and exclusion, I think you are clouding your
thoughts. Give one of the two an unconditional
precedence to the other. Else, you may never make
something concrete out of your nebulous idea.
The author of XXCopy