Re: How to avoid searching this folder?



On Fri, 25 Mar 2011 23:59:18 -0700, "John W. Krahn"
<jwkrahn@xxxxxxxxxxx> wrote:

geoff@xxxxxxxxxxxxxxx wrote:
Hello

Hello,

I am using Tom Boutell's simple search engine on my website but would
like it to not index the files in a particular folder called archives.

How would I modify the code for this? I have tried and so far failed.

Thanks

Geoff

#!/usr/bin/perl

The next two lines should be:

use warnings;
use strict;


$path = "/path/public_html";
$webpath = "";
$indexname = "/path/formmail/searchindex.txt";

my $path = "/path/public_html";
my $webpath = "";
my $indexname = "/path/formmail/searchindex.txt";


$nextFd = 0;

It looks like you don't really need this variable, so what is it really
supposed to do for your program?


open(OUT, ">$indexname");

You should *always* verify that the file was opened correctly before
trying to use what may be an invalid filehandle:

open OUT, '>', $indexname or die "Cannot open '$indexname' because: $!";


&update($path, $webpath);

In modern versions of Perl you don't need to use ampersands on
subroutine calls:

update($path, $webpath);


sub update {
my($path, $webpath) = @_;
my($dd) = $nextFd++;

Why are you storing a number in a variable that you are going to use for
a directory handle? That makes no sense.


print "Updating in $path\n";
if (!opendir($dd, $path)) {
print STDERR "Warning: can't open $path\n";
return;
}

You should declare variables where you first use them and you should
include $! in the error message so you know why it failed:

opendir my $dd, $path or do {
warn "Warning: can't open '$path' because: $!";
return;
};


while ($entry = readdir($dd)) {

while ( my $entry = readdir $dd ) {


if ($entry =~ /^\.$/) {
next;
}

if ($entry =~ /^\.\.$/) {
next;
}

Or simply:

next if $entry =~ /\A\.\.?\z/;


if (-d "$path/$entry") {
&update("$path/$entry", "$webpath/$entry");
next;
}
if (($entry !~ /.html$/i)&& ($entry !~ /.htm$/i)) {
next;
}

You have to escape the period or it will match any character and you can
combine both regular expressions into one (same as example above):

next if $entry !~ /\.html?$/i;


my($fd) = $nextFd++;

Why are you storing a number in a variable that you are going to use for
a filehandle? That makes no sense.


if (!open($fd, "$path/$entry")) {
print STDERR "Warning: can't open
$path/$entry\n";
next;
}

You should declare variables where you first use them and you should
include $! in the error message so you know why it failed:

open my $fd, '<', "$path/$entry" or do {
warn "Warning: can't open '$path/$entry' because: $!";
next;
};


my(%words) = ( );

Or just:

my %words;


my($line);
while ($line =<$fd>) {

Or just:

while ( my $line = <$fd> ) {


# Support for turning off the search engine
# indexer for parts of a page. These markers
# must have a line to themselves. 3/13/00
if ($line =~ /<\!\-\- SEARCH-ENGINE-OFF -->/)
{
while ($line =<$fd>) {
if ($line =~ /<\!\-\-
SEARCH-ENGINE-ON -->/) {
last;
}
}
next;
}
# Simple HTML flusher
$line =~ s/\<.*?\>//g;
# Case insensitive
$line =~ tr/A-Z/a-z/;
# If it's not a letter, it's whitespace
$line =~ s/[^a-z]/ /g;

You could also use tr/// for that:

$line =~ tr/a-z/ /c;


my(@words) = split(/\s+/, $line);

That might be better as:

my @words = split ' ', $line;


my($p);
for $p (@words) {

Better as:

for my $p ( @words ) {


if (length($p)) {

Why would $p have zero length? Probably because you are using /\s+/
instead of ' ' as the first argument to split which will give you a zero
length string if there is leading whitespace in $line.


$words{$p}++;
}
}
}
print OUT "$webpath/$entry ";
my($first) = 1;

Why are you forcing list context on a scalar assignment?


while (($key, $val) = each(%words)) {

Better as:

while ( my ( $key, $val ) = each %words ) {


print OUT "$val:$key";
if ($first) {
$first = 0;
} else {
print OUT " ";
}

So you want no space between the first and second "$val:$key" but a
space after every other occurrence of "$val:$key" including at the end
of the line?


}
print OUT "\n";

It looks like you could probably do that while loop like this instead:

print OUT join( ' ', map "$words{$_}:$_", keys %words ), "\n";


close($fd);
}
closedir($dd);
}
close(OUT);




John

John,

You have really made a lot of no dount useful comments but the code is
not mine - it came from Tom Boutell's site and my only concern was to
be able to avoid indexing some particular files/folders.

Cheers

Geoff
.



Relevant Pages

  • Re: How to avoid searching this folder?
    ... Why are you storing a number in a variable that you are going to use for a directory handle? ... You should declare variables where you first use them and you should include $! ... in the error message so you know why it failed: ... instead of ' ' as the first argument to split which will give you a zero length string if there is leading whitespace in $line. ...
    (comp.lang.perl.misc)
  • Re: Zero values cause errors on log scale
    ... >>If these zero values are "real" then a log scale is not appropriate, ... > The Excel error message is infuriating because it cannot be disabled on ... > a per plot basis. ... > Martin Brown ...
    (microsoft.public.excel.charting)
  • Re: silly digital specs on analog parts
    ... and DAC registers are cleared to zero at the falling edge of ... forcing the analog outputs to zero scale. ... I've found that any error message results because I did something ...
    (sci.electronics.design)
  • Re: Customize the effect of enumerate()?
    ... you expecting that will be neither less than zero, greater than zero, ... I don't like the error message. ... tell the caller what went wrong and why it is an invalid index. ... programming that defensively, I'd write: ...
    (comp.lang.python)
  • Re: A Use of Static Typing
    ... just set the true to be zero, like I hav explained so many time... ... now has a type mismatch: I have to change something at ... type error message, because the new change leads to ... Without a static type system, I could not have used the ...
    (comp.lang.misc)