Re: Directory sorted in chronological order
From: Uri Guttman (uri_at_stemsystems.com)
Date: 05/06/04
- Next message: David K. Wall: "Re: read from comma delimited file"
- Previous message: A. Sinan Unur: "Re: Books online????"
- In reply to: Sam Holden: "Re: Directory sorted in chronological order"
- Next in thread: Marshall Dudley: "Re: Directory sorted in chronological order"
- Reply: Marshall Dudley: "Re: Directory sorted in chronological order"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 06 May 2004 00:50:53 GMT
>>>>> "SH" == Sam Holden <sholden@flexal.cs.usyd.edu.au> writes:
SH> On Thu, 06 May 2004 01:23:35 +0200, Tore Aursand <tore@aursand.no> wrote:
>> On Wed, 05 May 2004 16:24:00 -0600, Jim Cochrane wrote:
>>> my @dir =
>>> map { $_->[0] }
>>> sort { $a->[1] <=> $b->[1] }
>>> map { [$_, -M] } readdir(DIR);
>>
>> my @dir = sort { -M $a <=> -M $b } readdir( DIR );
>>
SH> The ST is an optimisation technique and as such is never neccesary.
SH> However, this case is one where it is almost certainly useful. -M
SH> results in a stat system call and system calls are slow, even
SH> worse it may require waiting dor a disk read which is even slower
SH> (though filesystem level caching should prevent that happening
SH> more than once on machines without much other disk access
SH> happening). To make matters even worse stat calls need to access
SH> the file based on its name, which is linear in the number of files
SH> in the directory.
SH> So each -M in the sort comparison function is O(N) (assuming a crappy
SH> filesystem with such linear access - which is normal) with possibly a
SH> large constant (hitting the disk). The sort comparison function is
SH> called O(N*logN) times and hence we have O(N*N*logN).
SH> The ST version gives us an O(1) comparison function and hence O(N*logN)
SH> for the sort, which is *much* better.
all of the sort optimization techniques (ST, GRT and orcish) do
that. the real savings is in converting any O( N log N ) work to O(N)
work (running -M on each file only once). you have to count the work of
-M as a constant in general. it may be O(N) when scanning the dir but
that will be cached in ram for sure as it will be hit often. the inodes
could be scattered around (or clustered on some file systems) but that
could be more disk or other complex access. disk accesses outweigh any
linear scan in ram times so ignore the file name search time.
SH> This is the first example I've seen where the ST actually reduces the
SH> runtime complexity of the sort by more than a constant factor - since
SH> it's the first time I've noticed a sort where the comparison function
SH> isn't O(1) in terms of the size of the list being sorted.
your analysis is slightly off IMO. you need to count inode accesses as
the main work and that is still O(N log N) and with caching who knows?
factoring out the -M to O(N) time is the win.
and the GRT factors out even more by removing the callback code itself
and using the internal cmp function.
and a perfect time to plug my soon to be released Sort::Maker
module. here is how it would be used to solve this:
use Sort::Maker ;
my $M_sorter = make_sorter( 'ST', number => '-M' ) ;
my @sorted = $M_sorter->( readdir(DIR) ) ;
or with the GRT for more speed:
my $M_sorter = make_sorter( 'GRT',
number => {
code => '-M',
unsigned_float => 1,
}
) ;
my @sorted = $M_sorter->( readdir(DIR) ) ;
if anyone wants to get a beta of this module i can get you the url. it
works for all 4 sort styles and has decent pod. it need many more tests,
benchmarks and examples in pod. if you want to be a beta tester and
provide any feedback, email me. my goal is to release it before
yapc::buffalo as i am giving a talk on it there.
uri
-- Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
- Next message: David K. Wall: "Re: read from comma delimited file"
- Previous message: A. Sinan Unur: "Re: Books online????"
- In reply to: Sam Holden: "Re: Directory sorted in chronological order"
- Next in thread: Marshall Dudley: "Re: Directory sorted in chronological order"
- Reply: Marshall Dudley: "Re: Directory sorted in chronological order"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|