Re: Perl script to mimic uniq
nobull_at_mail.com
Date: 02/03/04
- Previous message: elastic: "Re: simple timer for win32, solaris and linux"
- In reply to: Martin Foster: "Re: Perl script to mimic uniq"
- Next in thread: Martin Foster: "Re: Perl script to mimic uniq"
- Reply: Martin Foster: "Re: Perl script to mimic uniq"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 3 Feb 2004 01:20:02 -0800
mdfoster44@netscape.net (Martin Foster) spits TOFU in my face:
> Thanks for your help.
Please, if you want to thank me, learn to quote properly. TOFU ((new)
Text Over, Full-quote Under) is considered very rude.
> My script now looks like this:
>
>
> #!/usr/bin/perl
> # Perl script to find most common CS
> use strict;
> use warnings;
>
> my $infile = "/home/martin/DATABASE/large.txt";
> open INFILE, $infile or die "***! Couldn't open file $infile: $!\n";
> my %count;
>
> do {
> $_ =~ s/^(\S+\s+){2}//;
> $count{$_}++
> } while <INFILE>;
Please see perldoc perlsyn for how "do { BLOCK } while EXPR" is
different from "while (EXPR) { BLOCK }". In this case you want the
latter.
Saying "$_ =~" i.e. "don't use $_, use $_ instead" is considered
somwhat affected. Either use $_ (and don't mention it) or use
something else instead.
You are assuming the s/// succedes always. Whenever you are assume
something like this will succede always you should decorate it with
"or die". This acts as a comment saying "I'm assuming this succedes
always". It also causes the program to crash out rather than carry on
and do something weird if your assumption was wrong.
> So I'm feeding the file into the %count array by removing the first two
> columns with the identifier information and then counting the keys.
> How can I still keep the identifier part of the line linked to the array?
> Since this is the part which I'm really interested in.
Ah, well you never mentioned that before. It helps to know what you
want.
> I can't keep the identifier in
> the %count array, since this would screw up the "for keys" part.
You can't keep it in the keys of %count, but you can keep it in the
values.
while (<INFILE>) {
s/^(\S+\s+){2}// or die;
push @{$count{$_}}, $1;
};
> I checked perldoc -q and found how to remove duplicates but I don't think
> I can rewrite this to do what I want.
Don't worry I'm sure your programming skill will improve. You appear
smart but inexperienced. You do, however, seem to have an unfortunate
streak of defeatism.
> The "for keys" method is brillant but I'm losing the identifier.
>
> So I'm back to my original script which looks like this.
Why? I showed you many ways to improve it independant of changing the
algorithm.
> #!/usr/bin/perl
> # Perl script to find most common CS
I still don't get how this comment relates to what your program does
nor what you say you want it to do.
> use strict;
> use warnings;
>
>
> my $infile = "/home/martin/DATABASE/large.txt";
> open INFILE, $infile or die "***! Couldn't open file $infile: $!\n";
> my @array = <INFILE>;
> print "There are ", $#array+1, " lines in the large array\n";
>
> my (@table);
> foreach my $array (@array) {
> push(@table, [split(/\s/, $array) ]);
> }
>
> for (my $k =0; $k<=$#array; $k++) {
> print "$table[$k][1] $table[$k][2] occurs ";
> my $matched=0;
> for (my $h =0; $h<=$no_lines; $h++) {
> my $match=0;
> for (my $j =2; $j<=11; $j++ ) {
> if ($table[$k][$j] == $table[$h][$j]){
> $match++;
> }
> }
> if ($match==10) {
> $matched++;
> }
> }
> print "$matched times\n";
> } # end of large loop
>
>
> But this sad looking script is not very smart and very slow, I don't want to
> run over each line. I would like the script to search the file,
> identify a sequence as unique. If there are duplicate sequences
> in that file then print out how many and do not revisit that line
> if it has been counted as a duplicate.
It's not clear what you are saying.
Are you saying you want the first ID (only) and the number of
occurances of each distinct sequence?
while (<INFILE>) {
s/^(\S+\s+){2}// or die;
push @{$count{$_}}, $1;
};
for ( values %count ) {
print "$_->[0]occurs ",scalar(@$_)," times\n";
}
> I still don't get why you say this newsgroup has been deleted.
I say it because it is true, and because it will help people who
didn't know this to reach a larger audience.
> What is the url for the replacement newsgroup?
What part of the answer to the Perl FAQ: "What are the Perl newsgroups
on Usenet?" are you having trouble understanding?
- Previous message: elastic: "Re: simple timer for win32, solaris and linux"
- In reply to: Martin Foster: "Re: Perl script to mimic uniq"
- Next in thread: Martin Foster: "Re: Perl script to mimic uniq"
- Reply: Martin Foster: "Re: Perl script to mimic uniq"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]