Re: Regex help
- From: "Steve" <no.one@xxxxxxxxxxx>
- Date: Mon, 15 Oct 2007 10:37:08 -0500
"Jerry Stuckle" <jstucklex@xxxxxxxxxxxxx> wrote in message
news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@xxxxxxxxxxxxxx
Steve wrote:
"Jerry Stuckle" <jstucklex@xxxxxxxxxxxxx> wrote in message
news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@xxxxxxxxxxxxxx
Steve wrote:
"Jerry Stuckle" <jstucklex@xxxxxxxxxxxxx> wrote in messageHi, Steve,
news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@xxxxxxxxxxxxxx
OK, I give up here. I am DEFINITELY not a Regex expert, and have beenalright, jer. let's see what we can do...
working on this for hours with no luck.
Basically I need to parse a page for certain information which will be
fed back into CURL to post to a site. I need to find four types of
tags on the page:
<input type=hidden name=a1 value=b1>
<input type=text name=a2>
<input type=submit name=a3 value=b3>
<select name=a4>
I don't need any other tags.
From the hidden and submit types, I need name and value. From the
text and select types, I just need the name.
I can assume the attributes will always show up in this order, but
there may be other things between the < and > delimiters.
Additionally, the actual type and name may have single or double
quotes around them, or neither.
Does anyone have some code for this? It doesn't have to be all one
regex.
here's an eyeballed attempt:
<(select\s?[^>].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit)\3[^>].*?)>
to keep it easier, i'd think about using that to get your general
matches. iterating through those, i'd apply another regex to break out
the name, type, and value. you could very well catch it all in the
above, however, it's not as straightforward and hence, not easily
maintained. if you need additional help on writing this, let me know.
i'll psuedo-code the whole enchillada if you want. this should be
sufficient in getting only those tags you listed above...which is a
good start.
btw, make the seach caseINsensitive.
Yep, it's a start. Some problems (output below), but I think it will
get me a little farther.
And you're right, I already gave up on getting everything in one pass. I
was thinking of trying to just get everything for a single element type
(i.e. all <input type=text ...> elements), but this gives me another
idea, also.
And the output from the first try:
Array
(
[0] => Array
(
[0] => <select n
[1] => <select n
[2] => <select n
)
[1] => Array
(
[0] => select n
[1] => select n
[2] => select n
)
[2] => Array
(
[0] =>
[1] =>
[2] =>
)
[3] => Array
(
[0] =>
[1] =>
[2] =>
)
[4] => Array
(
[0] =>
[1] =>
[2] =>
)
)
well, that's no so good a start! i'll break out the old regex ide and fix
that...if you want.
If you have the time, I would appreciate it. Otherwise I can struggle
through this myself :-)
ok, here's the one to get the select:
(select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*?[^>]
here's the one to break out the inputs and capture each type, name, and
value:
(input)\s*?[^n].*?(?:(name|type|value)\s*?=\s*?(?:'|")?([^\2>]*?)\2?(?:\s)?)*?>
the problem with this one though, is that it debugs fine in 'the regulator'
regex ide. however, some of the captures are being overwritten under
preg_match_all.
the implementation would have been an array of these two patterns. preg
should return the type (select or input)...from that point, you'd know where
in the matches to find the type, name, and value regardless of the order in
which it came. as it is, you can use $matches[0][...n] on the input pattern
matches to iterate the full input match.
hope that helps.
.
- Follow-Ups:
- Re: Regex help
- From: Jerry Stuckle
- Re: Regex help
- From: Paul Lautman
- Re: Regex help
- References:
- Regex help
- From: Jerry Stuckle
- Re: Regex help
- From: Steve
- Re: Regex help
- From: Jerry Stuckle
- Re: Regex help
- From: Steve
- Re: Regex help
- From: Jerry Stuckle
- Regex help
- Prev by Date: Re: Access file outside DocumentRoot
- Next by Date: Re: Refresh button
- Previous by thread: Re: Regex help
- Next by thread: Re: Regex help
- Index(es):