Re: Regex help
- From: Jerry Stuckle <jstucklex@xxxxxxxxxxxxx>
- Date: Tue, 16 Oct 2007 08:43:56 -0400
Steve wrote:
"Jerry Stuckle" <jstucklex@xxxxxxxxxxxxx> wrote in message news:bbudnS3Qb-29T47anZ2dnUVZ_uzinZ2d@xxxxxxxxxxxxxxSteve wrote:"Jerry Stuckle" <jstucklex@xxxxxxxxxxxxx> wrote in message news:u9WdnU2yhZ2Q5o7anZ2dnUVZ_trinZ2d@xxxxxxxxxxxxxxSteve wrote:ok, here's the one to get the select:"Jerry Stuckle" <jstucklex@xxxxxxxxxxxxx> wrote in message news:K-qdnTSkY4NaoI7anZ2dnUVZ_j6dnZ2d@xxxxxxxxxxxxxxIf you have the time, I would appreciate it. Otherwise I can struggle through this myself :-)Steve wrote:well, that's no so good a start! i'll break out the old regex ide and fix that...if you want."Jerry Stuckle" <jstucklex@xxxxxxxxxxxxx> wrote in message news:KaadnQnnGt0WT4_anZ2dnUVZ_tajnZ2d@xxxxxxxxxxxxxxHi, Steve,OK, I give up here. I am DEFINITELY not a Regex expert, and have been working on this for hours with no luck.alright, jer. let's see what we can do...
Basically I need to parse a page for certain information which will be fed back into CURL to post to a site. I need to find four types of tags on the page:
<input type=hidden name=a1 value=b1>
<input type=text name=a2>
<input type=submit name=a3 value=b3>
<select name=a4>
I don't need any other tags.
From the hidden and submit types, I need name and value. From the text and select types, I just need the name.
I can assume the attributes will always show up in this order, but there may be other things between the < and > delimiters. Additionally, the actual type and name may have single or double quotes around them, or neither.
Does anyone have some code for this? It doesn't have to be all one regex.
here's an eyeballed attempt:
<(select\s?[^>].*?)|(input\s[^t]*?type\s*?=\s?('|"|\s)(hidden|text|submit)\3[^>].*?)>
to keep it easier, i'd think about using that to get your general matches. iterating through those, i'd apply another regex to break out the name, type, and value. you could very well catch it all in the above, however, it's not as straightforward and hence, not easily maintained. if you need additional help on writing this, let me know. i'll psuedo-code the whole enchillada if you want. this should be sufficient in getting only those tags you listed above...which is a good start.
btw, make the seach caseINsensitive.
Yep, it's a start. Some problems (output below), but I think it will get me a little farther.
And you're right, I already gave up on getting everything in one pass. I was thinking of trying to just get everything for a single element type (i.e. all <input type=text ...> elements), but this gives me another idea, also.
And the output from the first try:
Array
(
[0] => Array
(
[0] => <select n
[1] => <select n
[2] => <select n
)
[1] => Array
(
[0] => select n
[1] => select n
[2] => select n
)
[2] => Array
(
[0] =>
[1] =>
[2] =>
)
[3] => Array
(
[0] =>
[1] =>
[2] =>
)
[4] => Array
(
[0] =>
[1] =>
[2] =>
)
)
(select)\s*?[^n].*?(name)\s*?=\s*?(?:\'|")?([^\3>]*)?\3?\s*?[^>]
here's the one to break out the inputs and capture each type, name, and value:
(input)\s*?[^n].*?(?:(name|type|value)
hey...did you notice this above? it should be [^ntv]
they may account for some of the wierdness. ;^)
Yep, and I got it working. Thanks again!
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@xxxxxxxxxxxxx
==================
.
- Follow-Ups:
- Re: Regex help
- From: Steve
- Re: Regex help
- References:
- Regex help
- From: Jerry Stuckle
- Re: Regex help
- From: Steve
- Re: Regex help
- From: Jerry Stuckle
- Re: Regex help
- From: Steve
- Re: Regex help
- From: Jerry Stuckle
- Re: Regex help
- From: Steve
- Re: Regex help
- From: Jerry Stuckle
- Re: Regex help
- From: Steve
- Regex help
- Prev by Date: Re: Regex help
- Next by Date: Re: PHP5 - Object
- Previous by thread: Re: Regex help
- Next by thread: Re: Regex help
- Index(es):
Relevant Pages
|