Re: Perl script to mimic uniq

nobull_at_mail.com
Date: 02/03/04

  • Next message: Jason A. Crome: "Re: How to add a user to /etc/passwd using CGI?"
    Date: 3 Feb 2004 01:20:02 -0800
    
    

    mdfoster44@netscape.net (Martin Foster) spits TOFU in my face:

    > Thanks for your help.

    Please, if you want to thank me, learn to quote properly. TOFU ((new)
    Text Over, Full-quote Under) is considered very rude.

    > My script now looks like this:
    >
    >
    > #!/usr/bin/perl
    > # Perl script to find most common CS
    > use strict;
    > use warnings;
    >
    > my $infile = "/home/martin/DATABASE/large.txt";
    > open INFILE, $infile or die "***! Couldn't open file $infile: $!\n";
    > my %count;
    >
    > do {
    > $_ =~ s/^(\S+\s+){2}//;
    > $count{$_}++
    > } while <INFILE>;

    Please see perldoc perlsyn for how "do { BLOCK } while EXPR" is
    different from "while (EXPR) { BLOCK }". In this case you want the
    latter.

    Saying "$_ =~" i.e. "don't use $_, use $_ instead" is considered
    somwhat affected. Either use $_ (and don't mention it) or use
    something else instead.

    You are assuming the s/// succedes always. Whenever you are assume
    something like this will succede always you should decorate it with
    "or die". This acts as a comment saying "I'm assuming this succedes
    always". It also causes the program to crash out rather than carry on
    and do something weird if your assumption was wrong.

    > So I'm feeding the file into the %count array by removing the first two
    > columns with the identifier information and then counting the keys.
    > How can I still keep the identifier part of the line linked to the array?
    > Since this is the part which I'm really interested in.

    Ah, well you never mentioned that before. It helps to know what you
    want.

    > I can't keep the identifier in
    > the %count array, since this would screw up the "for keys" part.

    You can't keep it in the keys of %count, but you can keep it in the
    values.

             while (<INFILE>) {
                     s/^(\S+\s+){2}// or die;
                     push @{$count{$_}}, $1;
             };

     
    > I checked perldoc -q and found how to remove duplicates but I don't think
    > I can rewrite this to do what I want.

    Don't worry I'm sure your programming skill will improve. You appear
    smart but inexperienced. You do, however, seem to have an unfortunate
    streak of defeatism.

    > The "for keys" method is brillant but I'm losing the identifier.
    >
    > So I'm back to my original script which looks like this.

    Why? I showed you many ways to improve it independant of changing the
    algorithm.

    > #!/usr/bin/perl
    > # Perl script to find most common CS

    I still don't get how this comment relates to what your program does
    nor what you say you want it to do.

    > use strict;
    > use warnings;
    >
    >
    > my $infile = "/home/martin/DATABASE/large.txt";
    > open INFILE, $infile or die "***! Couldn't open file $infile: $!\n";
    > my @array = <INFILE>;
    > print "There are ", $#array+1, " lines in the large array\n";
    >
    > my (@table);
    > foreach my $array (@array) {
    > push(@table, [split(/\s/, $array) ]);
    > }
    >
    > for (my $k =0; $k<=$#array; $k++) {
    > print "$table[$k][1] $table[$k][2] occurs ";
    > my $matched=0;
    > for (my $h =0; $h<=$no_lines; $h++) {
    > my $match=0;
    > for (my $j =2; $j<=11; $j++ ) {
    > if ($table[$k][$j] == $table[$h][$j]){
    > $match++;
    > }
    > }
    > if ($match==10) {
    > $matched++;
    > }
    > }
    > print "$matched times\n";
    > } # end of large loop
    >
    >
    > But this sad looking script is not very smart and very slow, I don't want to
    > run over each line. I would like the script to search the file,
    > identify a sequence as unique. If there are duplicate sequences
    > in that file then print out how many and do not revisit that line
    > if it has been counted as a duplicate.

    It's not clear what you are saying.

    Are you saying you want the first ID (only) and the number of
    occurances of each distinct sequence?

             while (<INFILE>) {
                     s/^(\S+\s+){2}// or die;
                     push @{$count{$_}}, $1;
             };

            for ( values %count ) {
               print "$_->[0]occurs ",scalar(@$_)," times\n";
            }

    > I still don't get why you say this newsgroup has been deleted.

    I say it because it is true, and because it will help people who
    didn't know this to reach a larger audience.

    > What is the url for the replacement newsgroup?

    What part of the answer to the Perl FAQ: "What are the Perl newsgroups
    on Usenet?" are you having trouble understanding?


  • Next message: Jason A. Crome: "Re: How to add a user to /etc/passwd using CGI?"
  • Quantcast