Re: Regexp, ***= and subexpressions

From: Kaitzschu (kaitzschu_at_kaitzschu.cjb.net.nospam.plz.invalid)
Date: 05/29/04


Date: Sat, 29 May 2004 07:35:21 +0300

It is 4:57 AM and awfully sunny. And when you can't sleep in saturday
morning, what else you can do but read newsgroups, code new things and
save your SUN-mouse from drowning in orange juice.
I really need to get sleeping shades.

On Fri, 28 May 2004, Bruce Hartweg wrote:

> You saw the part about it being ...the rest of the RE... but missed the
> part that says that it is only valid at the start or the RE. (that is
> the start of the entire RE, not a part of it.

True, I was being hasty and missed "-- (or an initial ***= director) has
specified that the user's input be treated as a literal string rather than
as an RE." Input clearly points out that this is not applicable in smaller
partions, but I got carried away by "If an RE of any flavor begins with"
and definition of (re) atom, "(where re is any regular expression)". So,
my bad, but documentation bites some of the bullet, too.

> I see a few solutions, use **= at the beginning and lose your ability to
> use the \m \M to avoid internal word matching

> or skip regexp alltogether and use string first to see if it matches

Unfortunately, no can do. This regexp mess is a part of of a Definition of
Overkill styled stream/buffer highlighter/searcher procedure, using Text
widget to easen things. [string first]ing whole widget as highlight terms
are changed just doesn't sound good enough for me to abandon Texts [search
-regexp].
And don't worry, it wouldn't apply as Overkill if it didn't use [search
-exact] in cases where regexp isn't required :P

> or your regsub solution (which I would do as a [string map ...] )

Am now applying [string map] from Donald's snippet and am facing new
problems as described in the end. I was leaning towards [regsub] because
of it's ease of use, as Kevin already showed up.

> or you could train your users/tester (update your help screen/user
> manual/whatever) that they are providing an RE not a string and they
> need to handle escaping when desired.

If I can't handle regular expressions using Tcls documentation it would be
instant bad karma just to consider forcing poor users to handle regexps
based on my interpretation of manual pages :)

On Sat, 28 May 2004, Donald Arseneau wrote:
> You have to sanitize the user input first.
>
> ... snip from something I did a month ago,,,
>
> # First, escape special characters in title string
> set tpattern [string map \
> {\\ \\\\ . \\. , \\, [ \\[ ] \\] \{ \\\{ \} \\\} * \\* ( \\( ) \\) ^ \\^ \$ \\\$} \
> $title]

This did fix the matching part, but - as usually - a new problem arose.
Seems to be that regexp can't match word boundaries at "non-word"
characters, consider this
   regexp {\*} {*}
returning true and this
   regexp {\m\*\M} {*}
returning false. That of course means I still can't match \m\[DS\]\M and
that makes me a sad panda. So, what would be next logical step? :)

But now Earth has rotated and I can get back to sleep.

-- 
-Kaitzschu


Relevant Pages

  • my script crashes when I try to rename the file!
    ... OK, thanks, but the script does not seem to rename the files. ... You can set the working directory from within your Perl ... # sleep 1; ... to the string in this particular file that I want to match. ...
    (perl.beginners)
  • Re: Regexp: Case-insensitive matching | N factorial
    ... a regexp component that matches a string of letters, ... Will match the string 'cat' anywhere it appears regardless of case. ... Javascript regular expressions have an alternative operator '|' (kind ...
    (comp.lang.javascript)
  • Re: Regular expression 0 - number range.
    ... involves creating a RegExp instance with a number of nested alternations, ... and matching that against a string value. ... accept and interpret properly according to ES5, section 15.9.3.2), and then ... one can just prepend 1 to YYYY. ...
    (comp.lang.javascript)
  • Re: How to get the "longest possible" match with Pythons RE module?
    ... Under most "NFA" implementation, such a regexp is ... searching a very long string that happens not to contain any matches. ... under many implementations it's very easy to end ... double-quote character at each end, and backslash escapes, ...
    (comp.lang.python)
  • Re: String search vs regexp search
    ... of occurences using string & regexp. ... I wrote the code for the regexp search as well as the function ... How long is the substring? ...
    (comp.lang.python)