Re: Parsing Large Files

xhoster_at_gmail.com
Date: 12/21/04


Date: 21 Dec 2004 02:44:33 GMT

BigDaDDY <ihatespam@hotmail.com> wrote:
> All,
>
> I recently received a reply to a previous post which is almost the answer
> I needed. The problem is, when I tried it at work, it wouldn't work.
> The reason it didn't work was because we have an early version of Perl at
> work which does not support "values" as in:

Thats highly unlikely.

>
> my(%nid, %id);
>
> foreach (values %id) {
> foreach my $y (keys %$_) {
> foreach (values %{$_->{$y}}) {
> die "duplicate $_!" if exists $nid{$_};
> $nid{$_}=$y;
> }
> }
> }

You could use "keys" insteads of "values", but of course then you would
have to change the corresponding accesses:

foreach my $x (keys %id) {
  foreach my $y ( keys %{$id{$x}}) {
etc.

But since values *does* work, figure out what makes you think it doesn't.
Maybe my code has an error I overlooked.

>
> Can this be done in a different way that doesn't use values? Also, I do
> care about the x-values as well, so I don't think the code above will
> work as is. Basically, I'm parsing one file to get an ID given an x,y,z
> cartesian coordinate. This gives me the hash %id which is keyed by
> {"x"}{"y"}{"z"} coordinates, whose values are id numbers.

Rather than massaging %id after the fact, just change the way you parse the
first file so that you get the proper data structure right from the start.
Since you want to search the second (big) file by ID, ID should be the
hash key. Since you want to have the Y and X (And possibly Z), store
both the Y and X (and Z). I'd probably just store them as ordered
3-element arrays nested in the hash (slot 0 is X, slot 1 is Y, slot 2 is
Z), but I think most would prefer to use hashes with labels, rather than
ordered arrays, so that is what I'll show:

my %hash;
while (<$file1>) {
  chomp;
  my ($id,$x,$y,$z) =split; # or whatever it takes to parse
  die "Duplicate $id" if exists $hash{$id};
  $hash{$id} = {X=>$x, Y=>$y, Z=>$z};
};

> Then I close
> this file, and open a large results file which only contains ids with
> results. I want to pull out all values for a unique y into a separate
> file

Probably not a separate file yet. It is easier to sort in memory than
to sort files from perl. (Assuming, like before, that all the *relevant*
data from the big file can fit in memory)

%y_stuff; # hash by Y of hash by X of list of data

while (<$file2>) {
  chomp;
  my ($id,$data)=split; #or whatever it takes to parse
  next unless $hash{$id}; #don't store lines we don't care about

  # just to make thing clearer, I'll use intermediate variables
  # Otherwise there are just too darn many braces in the push...
  my $x = $hash{$id}{X};
  my $y = $hash{$id}{Y};

  push @{$y_stuff{$y}{$x}}, $data;
};

foreach my $y (keys %y_stuff) {
  ## open your file for this Y
  foreach my $x ( sort {$a<=>$b} keys %{$y_stuff{$y}} ) {
     ## format and print your data
     ### (it is still not clear how you want the data sorted
     ### within each X-Y group. I'll assume it is order it was
     ### first seen)
     print "@{$y_stuff{$y}{$x}}\n";
  };
  ## close your file
};

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB


Relevant Pages

  • Re: Workaround with Remoting in CF
    ... > else you can store it in the Application Cache. ... >> I need a hash table in memory and clients accesing to it. ...
    (microsoft.public.dotnet.framework.compactframework)
  • Re: DBI (with Oracle) out of memory error
    ... Hashes need a lot of memory. ... With 280k hash keys, I estimate that your ... How much memory does your script use just before it runs out of memory? ... And I just prefer to store the data in a hash rather than ...
    (comp.lang.perl.misc)
  • Re: bad code, needs work...
    ... > The problem I am getting is checking to see if one field matches the city, ... The outfile opens can be handled in a loop that will at least prevent ... then store that scalar into a hash, which would be a convenient way to ... Again if you store the output handles to a hash, ...
    (perl.beginners)
  • Re: Best practices for storing/retrieving login credentials
    ... you must make sure that incoming callers are allowed to make FTP ... I think - Roy is interested in is how to protect the common FTP credentials. ... You cannot hash the FTP password, because the application needs to provide ... > then have to consider how to store the encryption key. ...
    (microsoft.public.dotnet.security)
  • Re: Secure password storing
    ... The reason why I can't store a hash is that the ... with a symetric key, there are loads of free libraries out there that will ... passport system are only storing hash of the passwords. ...
    (microsoft.public.dotnet.general)