Re: Directory sorted in chronological order
From: Marshall Dudley (mdudley_at_king-cart.com)
Date: 05/06/04
- Next message: Toni Erdmann: "Re: Use of goto after an alarm?"
- Previous message: Web Surfer: "Re: newbie regexp question"
- In reply to: Uri Guttman: "Re: Directory sorted in chronological order"
- Next in thread: Uri Guttman: "Re: Directory sorted in chronological order"
- Reply: Uri Guttman: "Re: Directory sorted in chronological order"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 06 May 2004 10:41:22 -0400
Here are the stats for the two with a directory with just under 2,000 files in it:
# time perl testtime1.pl
number of files = 1864
0.046u 0.092s 0:00.13 100.0% 16+381k 0+0io 0pf+0w
# time perl testtime2.pl
number of files = 1864
0.021u 0.014s 0:00.03 100.0% 20+840k 0+0io 0pf+0w
And this is the code for testtime1.pl:
opendir (DIR,"./");
my @dir = sort { -M $a <=> -M $b } readdir( DIR );
close DIR;
my $number_of_files = scalar @dir;
print "number of files = $number_of_files\n";
and for testtime2.pl:
opendir (DIR, "./");
my @dir =
map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, -M] } readdir(DIR);
closedir DIR;
my $number_of_files = scalar @dir;
print "number of files = $number_of_files\n";
Marshall
Uri Guttman wrote:
> >>>>> "SH" == Sam Holden <sholden@flexal.cs.usyd.edu.au> writes:
>
> SH> On Thu, 06 May 2004 01:23:35 +0200, Tore Aursand <tore@aursand.no> wrote:
> >> On Wed, 05 May 2004 16:24:00 -0600, Jim Cochrane wrote:
> >>> my @dir =
> >>> map { $_->[0] }
> >>> sort { $a->[1] <=> $b->[1] }
> >>> map { [$_, -M] } readdir(DIR);
> >>
> >> my @dir = sort { -M $a <=> -M $b } readdir( DIR );
> >>
> SH> The ST is an optimisation technique and as such is never neccesary.
>
> SH> However, this case is one where it is almost certainly useful. -M
> SH> results in a stat system call and system calls are slow, even
> SH> worse it may require waiting dor a disk read which is even slower
> SH> (though filesystem level caching should prevent that happening
> SH> more than once on machines without much other disk access
> SH> happening). To make matters even worse stat calls need to access
> SH> the file based on its name, which is linear in the number of files
> SH> in the directory.
>
> SH> So each -M in the sort comparison function is O(N) (assuming a crappy
> SH> filesystem with such linear access - which is normal) with possibly a
> SH> large constant (hitting the disk). The sort comparison function is
> SH> called O(N*logN) times and hence we have O(N*N*logN).
>
> SH> The ST version gives us an O(1) comparison function and hence O(N*logN)
> SH> for the sort, which is *much* better.
>
> all of the sort optimization techniques (ST, GRT and orcish) do
> that. the real savings is in converting any O( N log N ) work to O(N)
> work (running -M on each file only once). you have to count the work of
> -M as a constant in general. it may be O(N) when scanning the dir but
> that will be cached in ram for sure as it will be hit often. the inodes
> could be scattered around (or clustered on some file systems) but that
> could be more disk or other complex access. disk accesses outweigh any
> linear scan in ram times so ignore the file name search time.
>
> SH> This is the first example I've seen where the ST actually reduces the
> SH> runtime complexity of the sort by more than a constant factor - since
> SH> it's the first time I've noticed a sort where the comparison function
> SH> isn't O(1) in terms of the size of the list being sorted.
>
> your analysis is slightly off IMO. you need to count inode accesses as
> the main work and that is still O(N log N) and with caching who knows?
> factoring out the -M to O(N) time is the win.
>
> and the GRT factors out even more by removing the callback code itself
> and using the internal cmp function.
>
> and a perfect time to plug my soon to be released Sort::Maker
> module. here is how it would be used to solve this:
>
> use Sort::Maker ;
>
> my $M_sorter = make_sorter( 'ST', number => '-M' ) ;
>
> my @sorted = $M_sorter->( readdir(DIR) ) ;
>
> or with the GRT for more speed:
>
> my $M_sorter = make_sorter( 'GRT',
> number => {
> code => '-M',
> unsigned_float => 1,
> }
> ) ;
>
> my @sorted = $M_sorter->( readdir(DIR) ) ;
>
> if anyone wants to get a beta of this module i can get you the url. it
> works for all 4 sort styles and has decent pod. it need many more tests,
> benchmarks and examples in pod. if you want to be a beta tester and
> provide any feedback, email me. my goal is to release it before
> yapc::buffalo as i am giving a talk on it there.
>
> uri
>
> --
> Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
> --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
> Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
- Next message: Toni Erdmann: "Re: Use of goto after an alarm?"
- Previous message: Web Surfer: "Re: newbie regexp question"
- In reply to: Uri Guttman: "Re: Directory sorted in chronological order"
- Next in thread: Uri Guttman: "Re: Directory sorted in chronological order"
- Reply: Uri Guttman: "Re: Directory sorted in chronological order"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|