Re: asm grep



Robert Redelmeier wrote:
Charles Crayne <ccrayne@xxxxxxxxxx> wrote in part:

Robert Redelmeier <redelm@xxxxxxxxxxxxxxx> wrote:

So do it the long way and mask the case bit
if that is acceptable for non-alpha too:

Which it is not, even in ASCII text files, unless you don't mind such
bugs as CR and LF matching punctuation characters. A better solution
for the ASCII character set would be to use a translate table.


Certainly you can use XLAT or more efficient equivalents.
However, I _did_ say, "if that is acceptable for non-alpha ..."

I'm not sure it is not for a minimalist rescuedisk grep.
By masking, lowercase will get stripped to uppercase, punctuation
and digits to things like DLE and <space> to NUL. This might
produce some false matches, but will _not_ skip true matches.
I don't mind this for my uses of grep.

Good point. We may be able to "get it right" without doung *too* much extra work in the loop. Herbert showed us something "doublecasing" the string "FfOoOo--BbAaRr"... fetch one byte from the "haystack" and two bytes from the "needle" - if it doesn't match one, try the other...

I haven't fooled at all with that. What I've fooled with is even more "obsessively small". We can address variables as "[ebp +/- ???]" smaller than as static variables. (if ??? is in signed byte range). I modified what I had to utilize this... saved a bit of size, and seems to have gained some speed (???).

First, I was wrongly thinking that I'd need to have variables on the stack to do this. mov ebp, esp/ sub esp, ???, etc. This requred a rework of the logic - can't just pop args anymore. I loaded esi with the appropriate value and used lodsd... seemed to work okay.

Then I realized that I could have left the variables where they were, and still accessed 'em off ebp, so I tried that modification. Smallest yet (248 bytes), but the previous version seems faster...

I'm thinking I want to get a bare-bones "base" with a good compromise of "small" and "fast" to add "options" to...

To get case-insensitive, we're going to have to abandon scasb and cmpsb and replace 'em with a "hand made" "icmpsb" and "iscasb". Will hurt size some but may not be bad for speed...

Printing filename (really need to do that) and/or offset, or just count, shouldn't be a problem... Printing non-matching lines is going to require major modification - does anyone actually use that?

Actually, I'd like an option
of stripping the high bit without having to run it through `tr`.

I'm not familiar with "tr". You want to... strip the high bit and then compare? Or strip the high bit from what we print? I've had problems with binary files trashin' my console... that might be a good one...

Best,
Frank
.