Re: Perl script to mimic uniq

From: Martin Foster (mdfoster44_at_netscape.net)
Date: 01/31/04

  • Next message: Jürgen Exner: "Re: Perl script to mimic uniq"
    Date: 30 Jan 2004 16:46:09 -0800
    
    

    nobull@mail.com wrote in message news:<4dafc536.0401301107.1d2f7cc9@posting.google.com>...
    > mdfoster44@netscape.net (Martin Foster) wrote in message news:<6a20f90a.0401291652.5fae2f4a@posting.google.com>...
    > > I would like to be able to mimic the unix tool 'uniq' within a Perl script.
    >
    > There are Perl implementations of the Unix tools "out there". (Doing
    > web search to find them is left as an exercise for the reader).
    >
    > > I have a file with entries that look like this
    > >
    > > 4 10 21 37 58 83 111 145 184 226
    > > 4 12 24 42 64 92 124 162 204 252
    > > 4 11 23 44 67 95 134 168 215 271
    > > .
    > > .
    > > .
    > >
    > > Many number sequences, I would like to analyze the file to tell me how often a
    > > sequence occurs throughout the file.
    >
    > That is not what Unix uniq does. 'uniq' compares adjacent lines.

    I know, I can sort lines to be adjacent and then use uniq.

    >
    > Always reduce your problems to their simplest form. The fact that the
    > lines of the file happen to be sequences of numbers in not part of
    > your problem's simplest form.
    >
    > I shall assume that you really want to count the number of times each
    > distints line appears in a file.
    >
    > The cannonical Perl one-liner to do this is:
    >
    > perl -en '$c{$_}++; END { print "$c{$_} $_" for keys %c }'
    >
    > Or as a script:
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    >
    > my %count;
    >
    > $count{$_}++ while <>;
    >
    > print "$count{$_} $_" for keys %count;
    > __END__
    >
    This is amazing, I don't understand how it works but it's very
    powerful.
    Can I se this script to compare the n columns of a file, no the entire
    file.

    >
    > > I've began writing a script:
    >
    > Good. We don't like helping people who don't show what they've tried.
    > As a requard I'll give you some general Perl programming tips!
    >
    > > #!/usr/bin/perl
    > > # Perl script to find most common CS
    >
    > That comment does not describe what the script does.
    > Wrong comments are worse than no comments.
    >
    > > use strict;
    >
    > Get as much help as you can, use warnings too!
    > >
    > > my @line;
    >
    > You never use this variable.
    >
    > > my $infile = "/home/martin/DATABASE/large.txt";
    > > open INFILE, $infile or die "***! Couldn't open file $infile: $!\n";
    > > my @array = <INFILE>;
    > > my $no_lines = $#array;
    >
    > Variable names should reflect what's in the variable.
    >
    > There's no point having a variable that's just a copy of $#array
    > since you can always just use $#array.
    >
    > > print "There are ", $no_lines+1, " lines in the large array\n";
    >
    > It would be more ideomatic to use scalar(@array) rather than $#array+1
    >
    > > my (@table);
    > > foreach my $array (@array) {
    > > push(@table, [split(/\s/, $array) ]);
    > > }
    >
    > For really simple for/push loops like this consider using map:
    >
    > my @table = map { [ split ] } @array;

    Ok. Thanks, I've not used map before, just beginning to learn.

    >
    > > my $no_cells = $#{$table[$no_lines]};
    >
    > Variable names should reflect what's in the variable.
    >
    > Anyhow you never use that variable.
    >
    > >
    > > for (my $k =0; $k<=$no_lines; $k++) {
    >
    > Don't use C-style for in Perl unless you need to.
    >
    > for my $k ( 0 .. $no_lines ) {
    >
    > > print "[$k] occurs ";
    >
    > Hang on, $k is the line number (minus one) not the content of the
    > line.
    > I suspect there's more to your original problem than you are telling
    > us.
    >
    > > my $match=0;
    > > my $matched=0;
    > > for (my $h =0; $h<=$no_lines; $h++) {
    > > for (my $j =3; $j<=12; $j++ ) {
    >
    > Where did those 3 and 12 come from. I suspect there's more to your
    > original problem than you are telling us.

    I've got a identifier for each line at the beginning, for example

    1666237 4 10 23 16 and so. The identifier is an id to link to
    something else and so on. I just want to compare the 10 columns with
    the numbers.

    >
    > > if ($table[$k][$j] == $table[$h][$j]){
    > > $match++;
    > > }
    > > }
    > > if ($match==10) {
    > > $matched++;
    > > }
    >
    > Rather than counting matches and checking you have 10 it would be
    > better to count mismatches an check you have 0. That way if the 12
    > ever had to become 13 you wouldn't have to have to change 10 to 11
    >
    > > }
    > print "$matched times\n";
    > > } # end of large loop
    > >
    > > Does anyone know a better, quicker method of doing this?
    >
    > Doing what? You've moved the goal-posts several times.
    >
    > > Many thanks in advance for any suggestions.
    >
    > I suggest that you get clear in your mind what you are asking before
    > you ask it.
    >
    > I also suggest you post to newsgroups that still exist (this one
    > doesn't, see FAQ). Your post will then be seen my many more people.
    BTW where is the FAQ, which says this newsgroup no longer exists?


  • Next message: Jürgen Exner: "Re: Perl script to mimic uniq"