Re: script to find most common last names in a file
From: Mark Day (soundz_at_techie.com)
Date: 11/04/04
- Next message: Mark Day: "Re: articles not showing up"
- Previous message: Mark Day: "Re: articles not showing up"
- In reply to: John W. Krahn: "Re: script to find most common last names in a file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
To: beginners@perl.org Date: Thu, 04 Nov 2004 13:27:34 -0800
John W. Krahn wrote:
> Mark Day wrote:
>
>> Hoping someone can offer some advice on this.
>> I had an idea to write a script that could open a file
>> of names, first and last, that would search for the most frequently
>> occurring last names and write out the first and last names of all
>> those that match to a new file. I am a beginner, and of course I have
>> quickly
>> run up against the limit of my knowledge.
>>
>> Here's what I've got, (not much), hope it makes some sort of sense
>>
>> names.txt format:
>> --------------
>> John Smith
>> Sue Jones
>> Dave Smith
>> Jim Beam
>> Frank Sinatra
>> -------------
>>
>> #!/usr/bin/perl
>
>
> use warnings;
> use strict;
>
>> $infile = "names.txt";
>> $result_file = "most_common.txt";
>>
>> open (FILE, "$infile") or die, "can't open $infile: $!\n";
>> open (OUT, ">$result_file") or die, "can't open $result_file: $!\n";
>>
>> while (<FILE>) {
>>
>> #this is wrong, i know how do i split the first and last names apart?
>>
>> my $first_names = $_ split /W*/;
>
>
> split() returns a list in list context so you need list context on the
> *left* side of the equals sign. If you don't provide list context then
> split() will store its results in the @_ array.
> Unlike in normal regular expressions * is *not* greedy in split()'s
> regular expression.
>
> $ perl -le'$_ = q/ one two /; split /W*/; print ">$_<" for @_'
> > <
> >o<
> >n<
> >e<
> > <
> > <
> > <
> >t<
> >w<
> >o<
> > <
>
>
>> # how do i get the remainder from the above split into the last names
>> # array bellow
>>
>> my @last_names =
>>
>>
>> # Here I'm lost
>> # My guess is that I need to split first and last apart,
>> # into a first separate first and last arrays
>> # then creat a hash table of first and last names, and compare the
>> last # names array to the hash table
>> # then somehow match the last names that match and print them out
>> # I don't know how to sort the contents of infile into a hash or
>> compare # them to the contents of a last names array, or even how to
>> split
>> # first and last into arrays, beyond a vague idea of splitting them on
>> # white space, with the split function and a regular expression
>> # That's as far as I've got, any suggestions, tips, etc much appreciated
>>
>> my %first_last ( 'first' => 'last');
>>
>> foreach (@last_names =~ %first_last) {
>>
>> # i know the above is wrong but don't know
>> # how to sort through the hash for matches
>> # or even how to populate the hash
>>
>> print OUT;
>>
>> }
>> }
>
>
> From your description this should do what you want:
>
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> my $infile = 'names.txt';
> my $result_file = 'most_common.txt';
>
> open FILE, '<', $infile or die "can't open $infile: $!\n";
> open OUT, '>', $result_file or die "can't open $result_file: $!\n";
>
> my %first_last;
> while ( <FILE> ) {
>
> # store the full name in a HoA (see perldsc)
> # using the last name as the key.
>
> push @{ $first_last{ (split)[ -1 ] } }, $_;
> }
>
>
> # this will print out the full names sorted by the number of
> # entries in the array with the most frequent last names first.
>
> for my $key ( sort { @{ $first_last{ $b } } <=> @{ $first_last{ $a } } }
> keys %first_last ) {
> print OUT for @{ $first_last{ $key } };
> }
>
> __END__
Thanks John, I'm going to have a close look at what you've done here,
and will be back to ask some silly questions when I have tried to fully
comprehend your code. It'll take me a little while to digest.
- Next message: Mark Day: "Re: articles not showing up"
- Previous message: Mark Day: "Re: articles not showing up"
- In reply to: John W. Krahn: "Re: script to find most common last names in a file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|