Re: Speed comparison of regex versus index, lc, and / /i



On Fri, 30 May 2008 13:28:21 +0000, John W. Krahn wrote:

Just the opposite. AFAIK if searching for a literal string (as opposed
to a regular expression pattern) the regexp engine will use the same
algorithm as index().

I don't know what it does internally, but actually using non-literal
strings in the regular expression match like "something|else" or
"first.*second" did not result in a significant slowdown. The search
string did not change at all during the execution of the program, so
the regular expression would only have been compiled once.

I assume that most of the slowdown was caused by the introduction of the
use of UTF, etc.

No - the "lc"-related slowdown was experienced even if I read in the
files as bytes and did not convert them into anything. I'm sure of
this because I converted to using UTF-8 halfway through coding because
of an unrelated problem, and by that point I'd already noticed that "lc"
or / /i more than doubled the time of the program execution. In fact at the
same time that I converted the searched files into UTF-8, I also converted
them to lower case.


.



Relevant Pages

  • Re: Speed comparison of regex versus index, lc, and / /i
    ... to a regular expression pattern) the regexp engine will use the same ... this because I converted to using UTF-8 halfway through coding because ... same time that I converted the searched files into UTF-8, ...
    (comp.lang.perl.misc)
  • Re: Looking for a regexp generator based on a set of known string representative of a string set
    ... If all you have are those strings, you are better off trying to infer ... So I would suggest that the OP explain what he intends to do with his regular expression. ... Of two contending targets the longer prevails. ... "There was a BEE BELONGing to hive nine LONGing to BE a BEEtle and thinking that BEING a BEE was okay, but she had BEEN a BEE LONG ...
    (comp.lang.python)
  • Re: trying to create a multiple pattern matcher
    ... Tom McGlynn wrote: ... You can certainly use "|)" but AFAIK regexs don't tell you ... intricate than just a simple choice between two literal strings. ... or'ed regular expression that had a match, 0 if it matches and is not ...
    (comp.lang.java.programmer)
  • Re: trying to create a multiple pattern matcher
    ... Tom McGlynn wrote: ... You can certainly use "|)" but AFAIK regexs don't tell you ... or'ed regular expression that had a match, 0 if it matches and is not ... Let me reinstate that I was just using a simple case to get my point across even if you actually use a true regexp, you may still need to know which one was the one that matched instead of passing the same strings and reset the matcher as many times as you need to replace some string. ...
    (comp.lang.java.programmer)
  • Re: trying to create a multiple pattern matcher
    ... Tom McGlynn wrote: ... You can certainly use "|)" but AFAIK regexs don't tell you ... or'ed regular expression that had a match, 0 if it matches and is not ... Let me reinstate that I was just using a simple case to get my point across even if you actually use a true regexp, you may still need to know which one was the one that matched instead of passing the same strings and reset the matcher as many times as you need to replace some string. ...
    (comp.lang.java.programmer)