Re: script to find most common last names in a file

From: Mark Day (soundz_at_techie.com)
Date: 11/04/04


To: beginners@perl.org
Date: Thu, 04 Nov 2004 13:27:34 -0800


John W. Krahn wrote:
> Mark Day wrote:
>
>> Hoping someone can offer some advice on this.
>> I had an idea to write a script that could open a file
>> of names, first and last, that would search for the most frequently
>> occurring last names and write out the first and last names of all
>> those that match to a new file. I am a beginner, and of course I have
>> quickly
>> run up against the limit of my knowledge.
>>
>> Here's what I've got, (not much), hope it makes some sort of sense
>>
>> names.txt format:
>> --------------
>> John Smith
>> Sue Jones
>> Dave Smith
>> Jim Beam
>> Frank Sinatra
>> -------------
>>
>> #!/usr/bin/perl
>
>
> use warnings;
> use strict;
>
>> $infile = "names.txt";
>> $result_file = "most_common.txt";
>>
>> open (FILE, "$infile") or die, "can't open $infile: $!\n";
>> open (OUT, ">$result_file") or die, "can't open $result_file: $!\n";
>>
>> while (<FILE>) {
>>
>> #this is wrong, i know how do i split the first and last names apart?
>>
>> my $first_names = $_ split /W*/;
>
>
> split() returns a list in list context so you need list context on the
> *left* side of the equals sign. If you don't provide list context then
> split() will store its results in the @_ array.
> Unlike in normal regular expressions * is *not* greedy in split()'s
> regular expression.
>
> $ perl -le'$_ = q/ one two /; split /W*/; print ">$_<" for @_'
> > <
> >o<
> >n<
> >e<
> > <
> > <
> > <
> >t<
> >w<
> >o<
> > <
>
>
>> # how do i get the remainder from the above split into the last names
>> # array bellow
>>
>> my @last_names =
>>
>>
>> # Here I'm lost
>> # My guess is that I need to split first and last apart,
>> # into a first separate first and last arrays
>> # then creat a hash table of first and last names, and compare the
>> last # names array to the hash table
>> # then somehow match the last names that match and print them out
>> # I don't know how to sort the contents of infile into a hash or
>> compare # them to the contents of a last names array, or even how to
>> split
>> # first and last into arrays, beyond a vague idea of splitting them on
>> # white space, with the split function and a regular expression
>> # That's as far as I've got, any suggestions, tips, etc much appreciated
>>
>> my %first_last ( 'first' => 'last');
>>
>> foreach (@last_names =~ %first_last) {
>>
>> # i know the above is wrong but don't know
>> # how to sort through the hash for matches
>> # or even how to populate the hash
>>
>> print OUT;
>>
>> }
>> }
>
>
> From your description this should do what you want:
>
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> my $infile = 'names.txt';
> my $result_file = 'most_common.txt';
>
> open FILE, '<', $infile or die "can't open $infile: $!\n";
> open OUT, '>', $result_file or die "can't open $result_file: $!\n";
>
> my %first_last;
> while ( <FILE> ) {
>
> # store the full name in a HoA (see perldsc)
> # using the last name as the key.
>
> push @{ $first_last{ (split)[ -1 ] } }, $_;
> }
>
>
> # this will print out the full names sorted by the number of
> # entries in the array with the most frequent last names first.
>
> for my $key ( sort { @{ $first_last{ $b } } <=> @{ $first_last{ $a } } }
> keys %first_last ) {
> print OUT for @{ $first_last{ $key } };
> }
>
> __END__

Thanks John, I'm going to have a close look at what you've done here,
and will be back to ask some silly questions when I have tried to fully
comprehend your code. It'll take me a little while to digest.



Relevant Pages

  • Re: script to find most common last names in a file
    ... splitreturns a list in list context so you need list context on the *left* ... store its results in the @_ array. ... Unlike in normal regular expressions * is *not* greedy in split's regular ... > # then creat a hash table of first and last names, ...
    (perl.beginners)
  • Re: Problem with anonymous array in hash
    ... Thanks John, sometimes I am too concise with my code and omit things ... The first number is, as expected, the number of elements in the array ... If you had enabled warnings perl would have given you a hint: ... The problem is that you are dereferencing $hash when you should be ...
    (comp.lang.perl.misc)
  • Re: Is that I can do something like that ?
    ... the parenthesis provide list context to the right hand side. ... to assign the nth list entry to the nth variable (in the left hand side ... while I am asking %u_info, the user_detail will return a hash, ... Simply return a list, or an array, from the function. ...
    (perl.beginners)
  • Re: help with array within another array
    ... here you're pushing to an array that perl has never heard of, @SESSION, ... so perl goes ahead and makes an empty one for you before it pushes your ... (A hash in list context is just a list.) ...
    (perl.beginners)
  • Re: Perl forgets variable every other pass in loop???
    ... When the hash is entirely read, a null array is ... returned in list context (which when assigned pro- ... duces a false value), ...
    (comp.lang.perl.misc)