[perl-python] 20050127 traverse a dir

From: Xah Lee (xah_at_xahlee.org)
Date: 01/27/05


Date: 27 Jan 2005 11:37:05 -0800


# -*- coding: utf-8 -*-
# Python

suppose you want to walk into a directory, say, to apply a string
replacement to all html files. The os.path.walk() rises for the
occasion.

© import os
© mydir= '/Users/t/Documents/unix_cilre/python'
© def myfun(s1, s2, s3):
© print s2 # current dir
© print s3 # list of files there
© print '------==(^_^)==------'
© os.path.walk(mydir, myfun, 'somenull')

----------------------
os.path.walk(base_dir,f,arg) will walk a dir tree starting at
base_dir, and whenever it sees a directory (including base_dir), it
will call f(arg,current_dir,children), where the current_dir is the
string of the current directory, and children is a *list* of all
children of the current directory. That is, a list of strings that are
file names and directory names. Try the above and you'll see.

now, suppose for each file ending in .html we want to apply function
g to it. So, when ever myfun is called, we need to loop thru the
children list, find files and ending in html (and not a directory),
then call g. Here's the code.

© import os
© mydir= '/Users/t/web/SpecialPlaneCurves_dir'
© def g(s): print "g touched:", s
© def myfun(dummy, dirr, filess):
© for child in filess:
© if '.html' == os.path.splitext(child)[1] \
© and os.path.isfile(dirr+'/'+child):
© g(dirr+child)
© os.path.walk(mydir, myfun, 3)

note that os.path.splitext splits a string into two parts, a portion
before the last period, and the rest in the second portion. Effectively

it is used for getting file suffix. And the os.path.isfile() make sure
that this is a file not a dir with .html suffix... Test it yourself.

one important thing to note: in the mydir, it must not end in a
slash. One'd think Python'd take care of such trivia but no. This took
me a while to debug.

also, the way of the semantics of os.path.walk is nice. The myfun can
be a recursive function, calling itself, crystalizing a program's
semantic.

---------------------------
# in Perl, similar program can be had.
# the prototypical way to traverse a dir
# is thru File::Find;

use File::Find qw(find);
$mydir= '/Users/t/web/SpecialPlaneCurves_dir';
find(\&wanted, $mydir);
sub g($){print shift, "\n";}
sub wanted {
if ($_ =~/\.html$/ && -T $File::Find::name) { g $File::Find::name;}
$File::Find::name;
}

# the above showcases a quick hack.
# File::Find is one of the worst module
# there is in Perl. One cannot use it
# with a recursive (so-called) "filter"
# function. And because the way it is
# written, one cannot make the filter
# function purely functional. (it relies
# on the $_) And the filter function
# must come in certain order. (for
# example, the above program won't work
# if g is moved to the bottom.) ...

# the quality of modules in Perl are
# all like that.
Xah
 xah@xahlee.org
 http://xahlee.org/PageTwo_dir/more.html



Relevant Pages

  • [perl-python] 20050127 traverse a dir
    ... suppose you want to walk into a directory, say, to apply a string ... replacement to all html files. ... So, when ever myfun is called, we need to loop thru the ... # with a recursive "filter" ...
    (comp.lang.python)
  • Re: getAttribute question
    ... oddity with IE that getAttributereturns an empty string if the ... HTML specification, ... appropriate places in the HTML 4 and DOM HTML specifications. ...
    (comp.lang.javascript)
  • Re: [PHP] generating an html intro text ...
    ... You would have to search out and pull in all closing tags. ... grab 256 characters -- The string. ... html markup should not go towards the string length count, ...
    (php.general)
  • Re: is there a way to take an html string and render it as a web browser would, from the command lin
    ... is there a way to take an html string and render it as a web ... browser would, from the command line? ... particular, image tags should print out as images, not image tags. ...
    (comp.lang.php)
  • Re: Suppress viewstate __viewstate rendering
    ... // at this point sb.ToString() contains a string of everything in that panel. ... I don't know that the HTML of the page can be ... > controls that make up a timesheet entry/report page. ... >>> is empty, so the HTML renders ...
    (microsoft.public.dotnet.framework.aspnet)