Matching stuff in arbitrary order with regex



Hi,

I have thought about whether or not is possible to match and capture strings in arbitraty order with a single regular expression.

For example, suppose you need to capture the attributes of a html <img> tag, such as the alt, title, src attributes etc. The problem is that they can come in any order.

Thus, all these variations are valid and equal:

<img src="mypic.jpg" alt="mypic" title="this is my picture">
<img alt="mypic" title="this is my picture" src="mypic.jpg">
<img title="this is my picture" src="mypic.jpg" alt="mypic">

The first one could be matched and captured with this regex:
<img src="[^"]*" alt="[^"]*" title="[^"]*">

But that will not match the two others. Of course, it could be done using the alternation operator |, but then there are also other attrbutes like class and border. Allowing for all permutations would then be impractical.

If we were to just match the text, it could be done easily with this expression:

<img (?:(src|alt|title|border|class)="([^"]*)"\s*)+>


But it only allows us to capture one of the attributes ....

So, can it be done?

If there is a better group to target the question, please let me know!

Regards,
Martin
.