Re: Parsing Large Files
xhoster_at_gmail.com
Date: 12/21/04
- Next message: Ed Murphy: "Re: Is zero even or odd?"
- Previous message: Alan J. Flavell: "Re: HowTo tell if from cmd_line || httpd"
- Maybe in reply to: Anno Siegel: "Re: Parsing Large Files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 21 Dec 2004 02:44:33 GMT
BigDaDDY <ihatespam@hotmail.com> wrote:
> All,
>
> I recently received a reply to a previous post which is almost the answer
> I needed. The problem is, when I tried it at work, it wouldn't work.
> The reason it didn't work was because we have an early version of Perl at
> work which does not support "values" as in:
Thats highly unlikely.
>
> my(%nid, %id);
>
> foreach (values %id) {
> foreach my $y (keys %$_) {
> foreach (values %{$_->{$y}}) {
> die "duplicate $_!" if exists $nid{$_};
> $nid{$_}=$y;
> }
> }
> }
You could use "keys" insteads of "values", but of course then you would
have to change the corresponding accesses:
foreach my $x (keys %id) {
foreach my $y ( keys %{$id{$x}}) {
etc.
But since values *does* work, figure out what makes you think it doesn't.
Maybe my code has an error I overlooked.
>
> Can this be done in a different way that doesn't use values? Also, I do
> care about the x-values as well, so I don't think the code above will
> work as is. Basically, I'm parsing one file to get an ID given an x,y,z
> cartesian coordinate. This gives me the hash %id which is keyed by
> {"x"}{"y"}{"z"} coordinates, whose values are id numbers.
Rather than massaging %id after the fact, just change the way you parse the
first file so that you get the proper data structure right from the start.
Since you want to search the second (big) file by ID, ID should be the
hash key. Since you want to have the Y and X (And possibly Z), store
both the Y and X (and Z). I'd probably just store them as ordered
3-element arrays nested in the hash (slot 0 is X, slot 1 is Y, slot 2 is
Z), but I think most would prefer to use hashes with labels, rather than
ordered arrays, so that is what I'll show:
my %hash;
while (<$file1>) {
chomp;
my ($id,$x,$y,$z) =split; # or whatever it takes to parse
die "Duplicate $id" if exists $hash{$id};
$hash{$id} = {X=>$x, Y=>$y, Z=>$z};
};
> Then I close
> this file, and open a large results file which only contains ids with
> results. I want to pull out all values for a unique y into a separate
> file
Probably not a separate file yet. It is easier to sort in memory than
to sort files from perl. (Assuming, like before, that all the *relevant*
data from the big file can fit in memory)
%y_stuff; # hash by Y of hash by X of list of data
while (<$file2>) {
chomp;
my ($id,$data)=split; #or whatever it takes to parse
next unless $hash{$id}; #don't store lines we don't care about
# just to make thing clearer, I'll use intermediate variables
# Otherwise there are just too darn many braces in the push...
my $x = $hash{$id}{X};
my $y = $hash{$id}{Y};
push @{$y_stuff{$y}{$x}}, $data;
};
foreach my $y (keys %y_stuff) {
## open your file for this Y
foreach my $x ( sort {$a<=>$b} keys %{$y_stuff{$y}} ) {
## format and print your data
### (it is still not clear how you want the data sorted
### within each X-Y group. I'll assume it is order it was
### first seen)
print "@{$y_stuff{$y}{$x}}\n";
};
## close your file
};
Xho
-- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB
- Next message: Ed Murphy: "Re: Is zero even or odd?"
- Previous message: Alan J. Flavell: "Re: HowTo tell if from cmd_line || httpd"
- Maybe in reply to: Anno Siegel: "Re: Parsing Large Files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|