Re: Using hashes to sort number sequences
From: Anno Siegel (anno4000_at_lublin.zrz.tu-berlin.de)
Date: 05/13/04
- Next message: Juha Laiho: "Re: "Definitive Guide to OOP in Perl"?"
- Previous message: Jim Keenan: "Re: Help Me Understand this"
- In reply to: Martin Foster: "Re: Using hashes to sort number sequences"
- Next in thread: gnari: "Re: Using hashes to sort number sequences"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 13 May 2004 17:13:44 GMT
Martin Foster <mdfoster44@netscape.net> wrote in comp.lang.perl.misc:
> Bob Walton <invalid-email@rochester.rr.com> wrote in message
> news:<40A2E65B.1020108@rochester.rr.com>...
> > Martin Foster wrote:
> >
> > ...
> > > I have two files: a.txt & b.txt
> > >
> > > a.txt=
> > > 191_6_270328 T1 4 10 19 34 55 72 88 116 157 200 280 332 388 451 756 4
> > > 0 5 0 4 0 6 2 6 2 8 0
> > > 191_6_270328 T2 4 9 17 22 34 56 83 112 146 181 266 320 376 431 665 3 0
> > ...
> > > b.txt=
> > > 191_6_9908682 T1 4 8 14 25 41 60 83 115 153 190 276 321 374 437 694 4
> > > 0 4 0 4 0 6 0 4 0 8 0
> > > 191_6_9908682 T2 4 10 19 30 44 64 92 122 155 198 285 338 394 446 739 4
> > > 0 5 0 4 0 6 0 8 0 8 2
> > ...
[...]
> > Why don't you just sort (using the Unix or maybe even the Win32 sort
[...]
> I may need to tell you a little more about the data, I'm not sure a sort
> would help me but maybe you have an idea.
>
> Each $name tag is the name of a crystal structure. Each T1, T2, etc describes
> an atom. For each structure there are six atoms. To identify if two crystal
> structures are the same, one can compare the coordination sequences ( the number
> sequences that follow the T1, T2, etc). For each structure all six sequences,
> must completely match another six sequences of another structure, but they can
> be in any order, ie T1, T2s may be called T3, T6 or whatever. The important
> part is that each structure has six lines, which is why I want to read
> them in separately. If I do a sort I will get matching lines of sequences
> grouped together. For some structures, only one or two lines will match the
> original structure and I will have to do careful counting throughout the
> output to get what I want.
If I get that right, there is a set of atoms (represented by sequences
of numbers), and a crystal (structure) is a sequence of six atoms. The
problem is to find the sequences that are permutations of each other.
If I got that entirely wrong, you can stop reading now.
Otherwise, the straightforward solution involves indeed sorting, but
not of the file as a whole, but of each set of six atoms. After sorting,
two permutations of the same atoms are equal (no matter how you sort).
This reduces the problem to finding the elements in a list that are
the same. Perl's standard solutions (involving a hash) apply.
In the actual case it may pay to re-encode the atoms with shorter
strings, which would save storage and might reduce sort time. I'm
not sure about the effect of key length on Perl's string sort. Uri?
How many different atoms are there? If they represent actual chemical
elements there can't be too many.
Before I go on further I'd like some feedback if this sounds plausible
at all.
Anno
- Next message: Juha Laiho: "Re: "Definitive Guide to OOP in Perl"?"
- Previous message: Jim Keenan: "Re: Help Me Understand this"
- In reply to: Martin Foster: "Re: Using hashes to sort number sequences"
- Next in thread: gnari: "Re: Using hashes to sort number sequences"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|