Re: splitting words with brackets




"Tim Chase" <python.list@xxxxxxxxxxxxxxxxx> wrote in message
news:mailman.8598.1153966351.27775.python-list@xxxxxxxxxxxxx
r = re.compile(r'(?:\([^\)]*\)|\[[^\]]*\]|\S)+')
r.findall(s)
['(a c)b(c d)', 'e']

Ah, it's exactly what I want! I thought the left and right
sides of "|" are equal, but it is not true.

In theory, they *should* be equal. I was baffled by the nonparity
of the situation. You *should" be able to swap the two sides of
the "|" and have it treated the same. Yet, when I tried it with
the above regexp, putting the \S first, it seemed to choke and
give different results. I'd love to know why.

Does the re do left-to-right matching? If so, then the \S will eat the
opening parens/brackets, and never get into the other alternative patterns.
\S is the most "matchable" pattern, so if it comes ahead of the other
alternatives, then it will always be the one matched. My guess is that if
you put \S first, you will only get the contiguous character groups,
regardless of ()'s and []'s. The expression might as well just be \S+.

Or I could be completely wrong...

-- Paul


.