Re: [PHP] Suggestions for optimization?

From: Eugene Lee (list-php-1_at_fsck.net)
Date: 11/30/03

  • Next message: Bronislav Klucka: "RE: [PHP-DB] Dynamic Website Question!"
    Date: Sun, 30 Nov 2003 04:13:52 -0600
    To: php-general@lists.php.net
    
    

    On Sat, Nov 29, 2003 at 09:32:19AM -0800, Galen wrote:
    :
    : I'm working on some database search ranking code. It currently
    : represents 95-98% of the time spent when doing fuzzy seaches. I have
    : tried my best to optimize my code - algorithmic shortcuts, eliminating
    : session variables, unsetting irrelevant results, etc and benchmarking
    : to find the best techniques. That's given me over a 10x improvement.
    : Unfortunately, because of the number of results it must process (up to
    : 20,000), it is still somewhat slow. I think it could use some code
    : structure/formating tweaks to eek out that last bit of performance, but
    : I don't know much about optimizing PHP code in that way. Does anybody
    : have suggestions?

    You should profile your code a bit more and see how much time getting
    spent in your foreach loops and your usort(). You could always rewrite
    your foreach loops into for loops and manually iterate through the array
    itself instead of a copy. And you use strtolower() and trim() on every
    pass through the data. That's a lot of work that can be pre-massaged in
    the database so that you don't need to do it within your loops (granted,
    this doubles your needed storage space in the hopes of speeding your
    fuzzy searches).

    Also, I noticed that your usort() was doing a normal numeric sort. That
    being the case, why not switch the line:

            usort($search_results, "cmp");

    and use:

            sort($search_results, SORT_NUMERIC);

    : Here's my code:
    :
    : if ($search_results[0]["relevancy"] == "")
    : {
    : function cmp($a, $b)
    : {
    : if($a["relevancy"] < $b["relevancy"])
    : {
    : return 1;
    : }
    : elseif($a["relevancy"] > $b["relevancy"])
    : {
    : return -1;
    : }
    : else
    : {
    : return 0;
    : }
    : }
    :
    : $search_statements = $_SESSION["search"]["statements"];
    :
    : foreach($search_results as $key1 => $value1)
    : {
    : $num_fields_matched = 0;
    : $result_score = 0;
    : $metaphone_ratio = 0;
    : foreach($search_statements as $key => $value)
    : {
    : if ($value != "" AND $value1[$key] != $value)
    : {
    : $value = strtolower(trim($value));
    : $value1[$key] =
    : strtolower(trim(($value1[$key])));
    : $num_fields_matched++;
    : $value_metaphone =
    : metaphone($value1[$key]);
    : $search_metaphone =
    : metaphone($value);
    : $search_position =
    : strpos($value1[$key], $value);
    : $string_count =
    : substr_count($value1[$key], $value);
    : $levenshtein = levenshtein($value,
    : $value1[$key], "0.5", 1, 1);
    :
    : if ($search_metaphone ==
    : $value_metaphone AND
    : $value_metaphone != "")
    : {
    : $metaphone_ratio = 1;
    : }
    : elseif ($search_metaphone != 0)
    : {
    : $metaphone_ratio = 0.6 * (1
    : /
    : levenshtein($search_metaphone, $value_metaphone));
    : }
    :
    : $result_score = $result_score +
    : ($levenshtein + (8 * $search_position)) - (2 * ($string_count - 1)) - (1.1
    : * $metaphone_ratio * $levenshtein);
    : }
    : elseif ($value1[$key] == $value)
    : {
    : $result_score = $result_score - 5;
    : }
    : }
    : if ($num_fields_matched == 0)
    : {
    : $num_fields_matched = 1;
    : }
    : $search_results[$key1]["relevancy"] = ($result_score
    : * -1) / $num_fields_matched;
    :
    : if ($fuzzy_search == true AND
    : $search_results[$key1]["relevancy"] < -5)
    : {
    : unset($search_results[$key1]);
    : }
    : }
    :
    : usort($search_results, "cmp");
    : }


  • Next message: Bronislav Klucka: "RE: [PHP-DB] Dynamic Website Question!"