Re: HTTP Filtering and Threads...
- From: Ben Morrow <ben@xxxxxxxxxxxx>
- Date: Fri, 14 Sep 2007 14:19:46 +0100
Quoth Dan <danett18@xxxxxxxxxxxx>:
1) I have a code in perl which is doing a HTTP request and getting a
response and saving in a variable, so I want to filter a specific
value of a field. My code is more or less like this one:
next unless /^<input name/i;
You are trying to parse HTML with regular expressions. This is a very
bad idea. I would strongly recommend using HTML::Parser, or another
module capable of actually parsing HTML.
my ($name, $value) = $_ =~ /input name="(.*)Name" type=.*
value="(.*)">/i;
This will fail because the * regex operator is 'greedy': it always takes
as much text as it can. This is why you are always getting the last
value in your example below: the first .* matches everything from the
first 'name="' all the way to the middle of the last <input> tag.
if ((length($value)) > 1){^^^^^^^^
$MiddleName = $value;
#Some Stuff Code...
print "$MiddleName";<br><br>
This is not Perl. Please post the *actual* code you ran. It make things
simpler :).
}
However the HTTP request return a HTML code that is more or less like
this:
#Some non relevante HTML stuff...
<input name="$mdName" type="hidden" value="Silva">
#Some non relevante HTML stuff...
<input name="Name" type="hidden" value="Silva">
<input name="mdName" type="hidden" value="Daniel">
#Some non relevante HTML stuff...<code>
The problem is that my code is getting the value of "mdName" which is
"Daniel" and I want it get the value of "$mdName" which is "Silva" and
if it is missing (blank) I want to get the value of "Name" which in
the example also is "Silva". But I never want to get the value of
"mdName" which is "Daniel" and is what always is happening. :(
Obs.: I also tried (without sucess) use:
* my ($name, $value) = $_ =~ input name="\"\$mdName\" type=.*
value="(.*)">/i;
* my ($name, $value) = $_ =~ m/input name=\"\$mdName\" type=.*
value="(.*)">/i;
* my ($name, $value) = $_ =~ input name="\/$mdName\/" type=.*
value="(.*)">/i;
* my ($name, $value) = $_ =~ m/input name="\/$mdName\/" type=.*
value="(.*)">/i;
Uh, why? Don't just randomly try things hoping one will work; instead,
understand what is going wrong and fix it.
2) In the some program I have a piece of code which list all users and
do a loop for call the function which will get detailed information of
each user (the code in question 1 is part of this function). The
snippet is like this one:
# Some irrelevant code stuff...
(my $ruid, @userIDs) = &GetUserList($start, $end);
Don't call subs with &. It was a Perl 4 practice, and has some strange
side-effects in Perl 5.
if ($userIDs[0] == -1) { exit(0); }
foreach $userID (@userIDs) {
&GetUserData($name, $middlename, $lname, $bdate);
Your sub GetUserData seems to be directly updating the variables pased
to it. This is a bad idea as it is not what someone reading the code
will expect. It would be better to return a list and call like
my ($name, $middlename, $lname, $bdate) = GetUserData;
Also, it seems to be getting the value of the user ID from a global
variable: again, it would be better to pass it to the function.
print "$userID\t: $name, $middlename, $lname, $bdate";
# Some irrelevant code stuff...
}
# Some irrelevant code stuff...
The function GetUserData() is really slow, it do HTTP Request, parse
some HTML stuff and the amount of users is big. So I would like to add
thread support to it, in a fashion that I could have for example 8
instances of this code running in paralel. :)
Note that this may well not make it run faster. Unless you have 8
processors (lucky you ;) ), it will just make things slower.
One thing that may be slowing things down is if you are fetching and
parsing the same page many times. You may want to look at the Memoize
module as an easy way of avoiding that.
I had looked at http://perldoc.perl.org/threads.html, but it doesn't
helped so much. I belive I should add the thread support in a fashion
that it work directly with the foreach loop instruction and
GetUserData(), right?
The simplest way to multi-thread the above is something like
use threads;
foreach $userID (@userIDs) {
async {
my ($name, $middlename, $lname, $bdate) =
GetUserData($userID);
print "$userID\t: $name, $middlename, $lname, $bdate";
# Some irrelevant code stuff...
}
}
This will run each request in a new thread; but as you have identified,
the output will come out any which way. If you really want to use
threads, you want to use something like Thread::Queue to pass the
results back to the parent thread, which can then deal with printing
them.
However I want to take care to doesn't overwrite data (in C when we
deal with threads we have some unsafe functions that can overwrite
values - which is not good)...
This is not an issue in Perl. Threads have completely separate
variables: threads in Perl are more like Unix' fork than like
traditional C threading.
3) The Perl2exe (http://www.indigostar.com/perl2exe.htm) is the best
option to convert Perl code to Executables? It really work well? Even
with complicated and sophisticated code (using thread, raw sockets,
windows registry access, etc)?
I've never used perl2exe (I understand it's not free?), but I have had
success with PAR, which you can install from CPAN.
Well, that's my first code in perl, so sorry for ugly/bad code (and
also I'm not a programmer, just a curious:). hehe
That's fine: there's nothing wrong with writing bad code when you are
first learning :). The code you posted isn't half as bad as some we see
in this group, anyway...
Thank you and sorry for amount (of dumb and off-topic) questions.
Not off-topic at all, and not dumb neither.
Ben
.
- Follow-Ups:
- Re: HTTP Filtering and Threads...
- From: Dan
- Re: HTTP Filtering and Threads...
- References:
- HTTP Filtering and Threads...
- From: Dan
- HTTP Filtering and Threads...
- Prev by Date: FAQ 4.17 How do I find yesterday's date?
- Next by Date: Re: removing rows based on two duplicate fileds
- Previous by thread: HTTP Filtering and Threads...
- Next by thread: Re: HTTP Filtering and Threads...
- Index(es):
Relevant Pages
|