Re: Perl script to mimic uniq
From: Martin Foster (mdfoster44_at_netscape.net)
Date: 01/31/04
- Previous message: Joe Smith: "Re: symbolic reference"
- In reply to: nobull_at_mail.com: "Re: Perl script to mimic uniq"
- Next in thread: nobull_at_mail.com: "Re: Perl script to mimic uniq"
- Reply: nobull_at_mail.com: "Re: Perl script to mimic uniq"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 30 Jan 2004 16:46:09 -0800
nobull@mail.com wrote in message news:<4dafc536.0401301107.1d2f7cc9@posting.google.com>...
> mdfoster44@netscape.net (Martin Foster) wrote in message news:<6a20f90a.0401291652.5fae2f4a@posting.google.com>...
> > I would like to be able to mimic the unix tool 'uniq' within a Perl script.
>
> There are Perl implementations of the Unix tools "out there". (Doing
> web search to find them is left as an exercise for the reader).
>
> > I have a file with entries that look like this
> >
> > 4 10 21 37 58 83 111 145 184 226
> > 4 12 24 42 64 92 124 162 204 252
> > 4 11 23 44 67 95 134 168 215 271
> > .
> > .
> > .
> >
> > Many number sequences, I would like to analyze the file to tell me how often a
> > sequence occurs throughout the file.
>
> That is not what Unix uniq does. 'uniq' compares adjacent lines.
I know, I can sort lines to be adjacent and then use uniq.
>
> Always reduce your problems to their simplest form. The fact that the
> lines of the file happen to be sequences of numbers in not part of
> your problem's simplest form.
>
> I shall assume that you really want to count the number of times each
> distints line appears in a file.
>
> The cannonical Perl one-liner to do this is:
>
> perl -en '$c{$_}++; END { print "$c{$_} $_" for keys %c }'
>
> Or as a script:
>
> #!/usr/bin/perl
> use strict;
> use warnings;
>
> my %count;
>
> $count{$_}++ while <>;
>
> print "$count{$_} $_" for keys %count;
> __END__
>
This is amazing, I don't understand how it works but it's very
powerful.
Can I se this script to compare the n columns of a file, no the entire
file.
>
> > I've began writing a script:
>
> Good. We don't like helping people who don't show what they've tried.
> As a requard I'll give you some general Perl programming tips!
>
> > #!/usr/bin/perl
> > # Perl script to find most common CS
>
> That comment does not describe what the script does.
> Wrong comments are worse than no comments.
>
> > use strict;
>
> Get as much help as you can, use warnings too!
> >
> > my @line;
>
> You never use this variable.
>
> > my $infile = "/home/martin/DATABASE/large.txt";
> > open INFILE, $infile or die "***! Couldn't open file $infile: $!\n";
> > my @array = <INFILE>;
> > my $no_lines = $#array;
>
> Variable names should reflect what's in the variable.
>
> There's no point having a variable that's just a copy of $#array
> since you can always just use $#array.
>
> > print "There are ", $no_lines+1, " lines in the large array\n";
>
> It would be more ideomatic to use scalar(@array) rather than $#array+1
>
> > my (@table);
> > foreach my $array (@array) {
> > push(@table, [split(/\s/, $array) ]);
> > }
>
> For really simple for/push loops like this consider using map:
>
> my @table = map { [ split ] } @array;
Ok. Thanks, I've not used map before, just beginning to learn.
>
> > my $no_cells = $#{$table[$no_lines]};
>
> Variable names should reflect what's in the variable.
>
> Anyhow you never use that variable.
>
> >
> > for (my $k =0; $k<=$no_lines; $k++) {
>
> Don't use C-style for in Perl unless you need to.
>
> for my $k ( 0 .. $no_lines ) {
>
> > print "[$k] occurs ";
>
> Hang on, $k is the line number (minus one) not the content of the
> line.
> I suspect there's more to your original problem than you are telling
> us.
>
> > my $match=0;
> > my $matched=0;
> > for (my $h =0; $h<=$no_lines; $h++) {
> > for (my $j =3; $j<=12; $j++ ) {
>
> Where did those 3 and 12 come from. I suspect there's more to your
> original problem than you are telling us.
I've got a identifier for each line at the beginning, for example
1666237 4 10 23 16 and so. The identifier is an id to link to
something else and so on. I just want to compare the 10 columns with
the numbers.
>
> > if ($table[$k][$j] == $table[$h][$j]){
> > $match++;
> > }
> > }
> > if ($match==10) {
> > $matched++;
> > }
>
> Rather than counting matches and checking you have 10 it would be
> better to count mismatches an check you have 0. That way if the 12
> ever had to become 13 you wouldn't have to have to change 10 to 11
>
> > }
> print "$matched times\n";
> > } # end of large loop
> >
> > Does anyone know a better, quicker method of doing this?
>
> Doing what? You've moved the goal-posts several times.
>
> > Many thanks in advance for any suggestions.
>
> I suggest that you get clear in your mind what you are asking before
> you ask it.
>
> I also suggest you post to newsgroups that still exist (this one
> doesn't, see FAQ). Your post will then be seen my many more people.
BTW where is the FAQ, which says this newsgroup no longer exists?
- Previous message: Joe Smith: "Re: symbolic reference"
- In reply to: nobull_at_mail.com: "Re: Perl script to mimic uniq"
- Next in thread: nobull_at_mail.com: "Re: Perl script to mimic uniq"
- Reply: nobull_at_mail.com: "Re: Perl script to mimic uniq"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]