Re: reg expression example...



Eric Sosman wrote:
On 8/29/2010 8:46 PM, john wrote:
Eric Sosman wrote:
On 8/29/2010 6:32 PM, john wrote:
Hi All,
I need to process large text file and I'm using this expression.
All I know that these words are coming in this order.

"word1.*word2(.*)word3.*word4.*word5"

How can I optimize it ?

What separates the words from each other, and how do you know
you've reached the end of the interstitial space and reached the
start of the next word?

Or if you're actually looking for lines like

word1word2buzzword3lightyearword4mumbleword5

... then you have my sympathies.


yeah, All I know is the order and I need to get piece between word2 and
word3 . It's possible to have multiple word1...word5 patterns and not
all of them include other words.

Pattern pattern = Pattern.compile(
"word1.*word2(.*)word3.*word4.*word5" , Pattern.MULTILINE|Pattern.DOTALL);

So, from "word1word2buzzword3lightyearword4mumbleword5", literally,
you want to extract "buzz" as the group between "word2" (those exact
five characters) and "word3" (those five)?
yes.

I need to use word1 and word5 as start and end of this pattern, but
there may be other word1...word5 patterns which don't include word3/word4 - I don't need them.

actually, I used "word1.*word2(.*?)word3.*word4.*word5"

>And you want to reject (not
match) "word9word8buzzword7lightyearword6word5"?
yes.



This works, but I guess, it's not the most efficient way...

It's the straightforward approach for the problem you've described.
Straightforward very often equals best, for many definitions of "best."
Have you made measurements that indicate it's not "good enough?"
no.
.