Re: sorting with expensive compares?




Dan Stromberg wrote:
> On Thu, 22 Dec 2005 22:06:42 +0000, Dan Stromberg wrote:
>
> >
> > Hi folks.
> >
> > Python appears to have a good sort method, but when sorting array elements
> > that are very large, and hence have very expensive compares, is there some
> > sort of already-available sort function that will merge like elements into
> > a chain, so that they won't have to be recompared as many times?
> >
> > Thanks!
>
> I probably should've been more specific.
>
> I'm wanting to sort a large number of files, like a bunch of output files
> from a large series of rsh or ssh outputs on a large series of distinct
> machines, a music collection in .ogg format (strictly redistributable and
> legally purchased music), a collection of .iso cdrom images (strictly
> redistributable and legally purchased software), and so forth.
>
> I'm not sorting the content of each file individually.
>
> I'm treating each file as a potentially very large string, and "sorting
> the strings".
>
> I've been using the following compare function, which in short checks, in
> order:
>
> 1) device number
> 2) inode number
> 3) file length
> 4) the beginning of the file
> 5) an md5 hash of the entire file
> 6) the entire file
>
> (If #1 and #2 are identical, then the file must be a hardlink to the other
> file. Also, files that do not have the same length can never be
> identical. And of course these items are generated on demand, making it
> frequently possible to avoid doing full-file-length compare on a lot of
> files)
>
> However, my program is still doing more #6 comparisons than seems
> strictly necessary when I could just toss all the filenames describing
> identical files into a list, and avoid re-comparing files with identical
> content over and over - I don't want to compare them to each other again
> and again), when there are a lot of identical files in the input list.
>
Why would #5 not enough as an indicator that the files are indentical ?

.



Relevant Pages

  • Re: list view question
    ... sorting columns in .net is not as simple as in VB. ... sort data other than strings and columns other than first column. ... exposes compare method to compare two objects. ...
    (microsoft.public.dotnet.languages.vb.controls)
  • Re: Sorting algorithm when comparison is heavy
    ... asking about sorting things with very expensive compare operations. ... so speed things up by using your method to sort them in, ... single digit of the long value. ...
    (comp.programming)
  • Re: sorting with expensive compares?
    ... > Python appears to have a good sort method, but when sorting array elements ... I'm not sorting the content of each file individually. ... I've been using the following compare function, which in short checks, in ...
    (comp.lang.python)
  • Re: Fastcode: Corrected version of SortShaUnit
    ... >functions and b) sorting large lists. ... The thing is, too, that any sort usually is as good as its compare ... expensive compare function for combsort to be super-beneficial. ...
    (borland.public.delphi.language.basm)
  • Re: sorting with expensive compares?
    ... > I'm wanting to sort a large number of files, like a bunch of output files ... > content over and over - I don't want to compare them to each other again ... when there are a lot of identical files in the input list. ...
    (comp.lang.python)