Re: Looking for lots of words in lots of files



Upload, wait, and google them.

Seriously tho, aside from using a real indexer, I would build a set of the words I'm looking for, and then loop over each file, looping over the words and doing quick checks for containment in the set. If so, add to a dict of file names to list of words found until the list hits 10 length. I don't think that would be a complicated solution and it shouldn't be terrible at performance.

If you need to run this more than once, use an indexer.

If you only need to use it once, use an indexer, so you learn how for next time.

On Jun 18, 2008, at 10:28 AM, brad wrote:

Just wondering if anyone has ever solved this efficiently... not looking for specific solutions tho... just ideas.

I have one thousand words and one thousand files. I need to read the files to see if some of the words are in the files. I can stop reading a file once I find 10 of the words in it. It's easy for me to do this with a few dozen words, but a thousand words is too large for an RE and too inefficient to loop, etc. Any suggestions?

Thanks
--
http://mail.python.org/mailman/listinfo/python-list

.



Relevant Pages

  • Re: Ada tasking question
    ... protected Indexer is ... task type Worker is ... end loop; ...
    (comp.lang.ada)
  • Re: If Statement to fill in column to last row
    ... The variable X serves as a counter, or maybe "indexer" is a better word, a loop. ... The X=2 establishes a starting value for the indexer and the value in the lastRow variable sets the ending value for it. ... > Dim lastRow As Long ...
    (microsoft.public.excel.programming)
  • Re: Looking for lots of words in lots of files
    ... Seriously tho, aside from using a real indexer, I would build a set of the words I'm looking for, and then loop over each file, looping over ... If you need to run this more than once, use an indexer. ... If you can't use an indexer, and performance matters, evaluate using grep and a shell script. ... Even if you are invoking grep multiple times it is still likely to be faster than a "maximally efficient" single pass over the file in python. ...
    (comp.lang.python)
  • Re: 50 Buttons!
    ... > If your buttons name's follow a pattern, you can use an indexer and loop ... Or, much better IMO, would be to have an array of buttons: ...
    (microsoft.public.dotnet.general)
  • Re: Help! how to get the content of each cell in a CellSet?
    ... you need to loop through the cellset, using an indexer to get the content. ... look throught the following c# code ...
    (microsoft.public.data.xmlanalysis)