Re: Filering a file



Dermot Paikkos wrote:
> On 6 Dec 2005 at 8:14, Adedayo Adeyeye wrote:
>
> > Hello David,
> >
> > I'm able to open the file, read the contents and output the results of
> > my initial filtering to a new file.
> >
> > The problem is that my new file has duplicate entries, and cleaning up
> > duplicates is where I'm stuck.
> >
> > Kind regards
>
> > On Mon, Dec 05, 2005 at 02:20:33PM +0100, Adedayo Adeyeye wrote:
> > > How do I write a script to parse through this file and just
> > > return the unique names. Ie I want the repetitions ignored.
> >
> > What have you tried? Where are you stuck? (Opening the file? Reading
> > the contents? The actual filtering?). Nothing in your question is CGI
> > related, have you got this working as a command line script but are
> > having trouble converting it to work under CGI? What code have you got
> > so far?
> >
>
> A long time ago someone on this list showed me exactly how to do
> this. I guess there are not subscribed anymore. You can find this
> code in a couple of places too, (cookbook P147).
>
> #!/bin/perl -w
> # Open and compare lists of file.
> # Report only the records that are NOT in both files.
>
> ## Always use strict. It can help make debugging a lot easier.
> use strict;
> my ($file1, $file2);
> my (%lns);
> ($file1, $file2) = @ARGV;
>
>
> open(FH,"< $file1") || die "Can't open $file1: $!\n";
> while (<FH>)
> {
> chomp;
> $lns{$_} = 0;
> }
> close(FH);
>
> open(FH2,"< $file2") || die "Can't open second file: $!\n";
> while (<FH2>)
> {
> chomp;
> if (defined $lns{$_})
> { $lns{$_} = 1; }
> else
> { $lns{$_} = 0; }
> }
> close(FH2);
>
> open(REPORT,"> $output") || die "Can't create $output: $!\n";
> foreach my $ln (keys %lns)
> {
> print if ($lns{$ln} == 0);
> }
> close(REPORT);
>
> You can obviously modify the 'next if' in the last loop to find
> duplicates " print if ($lns{$ln} == 1)".
>
> Please check this. I haven't had time to create any data to test it
> against.

That will not preserve the order of lines. This will

perl -ne 'print unless $seen{$_}++' file1 file2

--
Brad

.