Re: removing duplication from a huge list.



odeits:
How big of a list are we talking about? If the list is so big that the
entire list cannot fit in memory at the same time this approach wont
work e.g. removing duplicate lines from a very large file.

If the data are lines of a file, and keeping the original order isn't
important, then the first to try may be to use the unix (or cygwin on
Windows) commands sort and uniq.

Bye,
bearophile
.



Relevant Pages

  • Re: removing duplication from a huge list.
    ... work e.g. removing duplicate lines from a very large file. ... and it won't all fit into memory. ... I would personally do some sort of hash of each item (or something as ...
    (comp.lang.python)
  • Re: removing duplication from a huge list.
    ... entire list cannot fit in memory at the same time this approach wont ... work e.g. removing duplicate lines from a very large file. ... and it won't all fit into memory. ...
    (comp.lang.python)
  • Re: removing duplication from a huge list.
    ... entire list cannot fit in memory at the same time this approach wont ... work e.g. removing duplicate lines from a very large file. ... Windows) commands sort and uniq. ...
    (comp.lang.python)
  • Re: removing duplication from a huge list.
    ... entire list cannot fit in memory at the same time this approach wont ... work e.g. removing duplicate lines from a very large file. ... Windows) commands sort and uniq. ...
    (comp.lang.python)
  • Re: Q: too much data too little memory
    ... with a large chunk of data that does not fit into all the ... | physical memory available to a computer, is it better, to run one process ... If your machine has enough physical RAM to actually swap in all of the ...
    (comp.unix.programmer)

Loading