Re: asm grep



On Sun, 30 Dec 2007 04:29:02 GMT
Frank Kotler <fbkotler@xxxxxxxxxxx> wrote:

I don't like to contradict the
pioneers, but "another side" might be "It's never too early to start
thinking about optimization!"

I fully agree. Designing for function and designing for performance
should go hand-in-hand. However, the third leg of that particular stool
is a design which will allow additional features to be added in a
subsequent release. And it is in this final respect that we hobbyists
so often screw up.

In your case, you started with the goal of optimizing the asmutils
version, and you naturally thought about the benefits of ignoring
line boundaries, but it didn't take long until the discussion got
around to desired features which negated that advantage.

Rod initially
mentioned that "repnz scasb" would beat my "astrlen" macro - which it
did, but only by one byte (due to push/pop edi, adjust ecx)

Perhaps, with the right design, it would not be necessary to save and
restore edi, or to adjust ecx.

how 'bout if we store each byte of the "needle"
as a dword. The character, the character "flipped" if we're doing
"-i" and it's alpha - the character duplicated, if not. That leaves us 16
bits for "flags" such as "must match start of line", "must match end
of line"... maybe "match any alpha", "match any number". "match multiple
characters" might be a problem.

I think that using "NnEeEeDdLlEe" for the search string is an excellent
idea -- far better than masking case bits. But trying to bit map the
other match requirements has a number of problems.

To begin with, matching the start of, or end of, a line is a
requirement for the search string as a whole -- not of the individual
characters. Next, since you don't want to pay the case insensitive
performance penalty when the user doesn't request it, you might well
wish to use a simple "repz cmpsb" for the (default) case sensitive
compare with no special matching rules. And, of course, you have
already mentioned the problem of encoding "match multiple characters".

What I thing one would want to do is to group the search string
into segments of one or more characters which have the same match
requirements.

-- Chuck

.



Relevant Pages

  • Re: Parsing Binary Files
    ... I was able to get proper matches. ... Well, you could supply multiple byte arrays, and check whether the nth ... > of characters after that search and display it. ... > I would know it because it will truncate with another search string. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: best language for text string search
    ... I'm going to have a maximum input of ~6million characters (normally ... The search string is going to maximum 30 characters ... I also thought that there would be a more suitable language to use ...
    (comp.programming)
  • Re: best language for text string search
    ... The search string is going to maximum 30 characters ... Here's an example in Python. ... for key in keys: ...
    (comp.programming)
  • Re: Difference between constant and a variable - Sql SP and Unicode
    ... Use nvarchar, not nchar, unless you mean to match only fieldName ... values ending in 35 space characters... ... Specify the search string as a Unicode string literal, ... >search string above as a variable, the select statement no longer works. ...
    (microsoft.public.sqlserver.programming)
  • Re: best language for text string search
    ... I'm going to have a maximum input of ~6million characters (normally ... The search string is going to maximum 30 characters ... I also thought that there would be a more suitable language to use ...
    (comp.programming)