Re: Find/Replace in TStringList loses lots of data



As you have got a really large number of files which are themselves
quite large, I would seriously consider using a stream (as Bruce
suggests) and a Boyer-Moore-Horspool algorithm for searching.

This would be approx 9 times faster for the string you quote. It is
faster than a single character search because it effectively checks the
Nth file character (where N is the search string length) against all
the characters in the search string. If it is not in the search string
then it jumps on N characters. If it is in the search string it jumps
on appropriately and searches for a last character match (for example
if it found an "X" it would jump on 4 characters). If it finds the last
search string character then it searches backwards checking for each
individual character in the search string. If it has found all the
search string characters then it has found the word.

Basically one sets up a 256 character array of byte values, each
element of that array is a jump value - mainly the search string
length, but appropriate values for the search string characters. Then
its a matter of entering the array with the Nth character and jumping
the corresponding element value in the array.

This sounds like a lot of coding, but the speed increase is surprising.
I have code if you're interested.

Alan Lloyd

.



Relevant Pages

  • Re: Searching by Unicode codes
    ... What character are you searching for? ... What is its hex number? ... search string ^U0xnnnn, for example - no luck there, and similarly no luck ... Are there any other settings ...
    (microsoft.public.word.application.errors)
  • Re: [Dialog] Cracking Richard
    ... in the database, ... Some people use special formats for sigs or even set sigs to be ... I can see in the hex editor that the control character that precedes ... search string contains regular expressions ...
    (news.software.readers)
  • Re: Find & Replace Question
    ... Your statement "That extra character remains part of the ... search string, but you want to keep that character, so ... you mark it with round brackets to enable you to put back ...
    (microsoft.public.word.newusers)
  • Re: how do I get word and paragraph based on character position?
    ... And what do you mean it contains the character 2043. ... expands the Selection to the paragraph. ... sets a string variable to use as the search string - less the paragraph ... Once I found this occurence I will link ...
    (microsoft.public.word.vba.beginners)
  • Re: Need help with textboxes
    ... character into the input box. ... one of assigning the reference to a variable: ... All forms and their children are stored in an array ... No. Bracket property accessors allow their argument to be any string value. ...
    (comp.lang.javascript)