Re: How to avoid searching this folder?
- From: geoff@xxxxxxxxxxxxxxx
- Date: Sun, 27 Mar 2011 22:26:07 +0100
On Fri, 25 Mar 2011 23:59:18 -0700, "John W. Krahn"
<jwkrahn@xxxxxxxxxxx> wrote:
geoff@xxxxxxxxxxxxxxx wrote:
Hello
Hello,
I am using Tom Boutell's simple search engine on my website but would
like it to not index the files in a particular folder called archives.
How would I modify the code for this? I have tried and so far failed.
Thanks
Geoff
#!/usr/bin/perl
The next two lines should be:
use warnings;
use strict;
$path = "/path/public_html";
$webpath = "";
$indexname = "/path/formmail/searchindex.txt";
my $path = "/path/public_html";
my $webpath = "";
my $indexname = "/path/formmail/searchindex.txt";
$nextFd = 0;
It looks like you don't really need this variable, so what is it really
supposed to do for your program?
open(OUT, ">$indexname");
You should *always* verify that the file was opened correctly before
trying to use what may be an invalid filehandle:
open OUT, '>', $indexname or die "Cannot open '$indexname' because: $!";
&update($path, $webpath);
In modern versions of Perl you don't need to use ampersands on
subroutine calls:
update($path, $webpath);
sub update {
my($path, $webpath) = @_;
my($dd) = $nextFd++;
Why are you storing a number in a variable that you are going to use for
a directory handle? That makes no sense.
print "Updating in $path\n";
if (!opendir($dd, $path)) {
print STDERR "Warning: can't open $path\n";
return;
}
You should declare variables where you first use them and you should
include $! in the error message so you know why it failed:
opendir my $dd, $path or do {
warn "Warning: can't open '$path' because: $!";
return;
};
while ($entry = readdir($dd)) {
while ( my $entry = readdir $dd ) {
if ($entry =~ /^\.$/) {
next;
}
if ($entry =~ /^\.\.$/) {
next;
}
Or simply:
next if $entry =~ /\A\.\.?\z/;
if (-d "$path/$entry") {
&update("$path/$entry", "$webpath/$entry");
next;
}
if (($entry !~ /.html$/i)&& ($entry !~ /.htm$/i)) {
next;
}
You have to escape the period or it will match any character and you can
combine both regular expressions into one (same as example above):
next if $entry !~ /\.html?$/i;
my($fd) = $nextFd++;
Why are you storing a number in a variable that you are going to use for
a filehandle? That makes no sense.
if (!open($fd, "$path/$entry")) {
print STDERR "Warning: can't open
$path/$entry\n";
next;
}
You should declare variables where you first use them and you should
include $! in the error message so you know why it failed:
open my $fd, '<', "$path/$entry" or do {
warn "Warning: can't open '$path/$entry' because: $!";
next;
};
my(%words) = ( );
Or just:
my %words;
my($line);
while ($line =<$fd>) {
Or just:
while ( my $line = <$fd> ) {
# Support for turning off the search engine
# indexer for parts of a page. These markers
# must have a line to themselves. 3/13/00
if ($line =~ /<\!\-\- SEARCH-ENGINE-OFF -->/)
{
while ($line =<$fd>) {
if ($line =~ /<\!\-\-
SEARCH-ENGINE-ON -->/) {
last;
}
}
next;
}
# Simple HTML flusher
$line =~ s/\<.*?\>//g;
# Case insensitive
$line =~ tr/A-Z/a-z/;
# If it's not a letter, it's whitespace
$line =~ s/[^a-z]/ /g;
You could also use tr/// for that:
$line =~ tr/a-z/ /c;
my(@words) = split(/\s+/, $line);
That might be better as:
my @words = split ' ', $line;
my($p);
for $p (@words) {
Better as:
for my $p ( @words ) {
if (length($p)) {
Why would $p have zero length? Probably because you are using /\s+/
instead of ' ' as the first argument to split which will give you a zero
length string if there is leading whitespace in $line.
$words{$p}++;
}
}
}
print OUT "$webpath/$entry ";
my($first) = 1;
Why are you forcing list context on a scalar assignment?
while (($key, $val) = each(%words)) {
Better as:
while ( my ( $key, $val ) = each %words ) {
print OUT "$val:$key";
if ($first) {
$first = 0;
} else {
print OUT " ";
}
So you want no space between the first and second "$val:$key" but a
space after every other occurrence of "$val:$key" including at the end
of the line?
}
print OUT "\n";
It looks like you could probably do that while loop like this instead:
print OUT join( ' ', map "$words{$_}:$_", keys %words ), "\n";
close($fd);
}
closedir($dd);
}
close(OUT);
John
John,
You have really made a lot of no dount useful comments but the code is
not mine - it came from Tom Boutell's site and my only concern was to
be able to avoid indexing some particular files/folders.
Cheers
Geoff
.
- References:
- How to avoid searching this folder?
- From: geoff
- Re: How to avoid searching this folder?
- From: John W. Krahn
- How to avoid searching this folder?
- Prev by Date: Re: using File::Find
- Next by Date: FAQ 9.12 How do I put a password on my web pages?
- Previous by thread: Re: How to avoid searching this folder?
- Next by thread: FAQ 5.40 Why do I get weird spaces when I print an array of lines?
- Index(es):
Relevant Pages
|