Re: Regular Expression Function
- From: axrock <chris.sefton@xxxxxxxxx>
- Date: Sat, 27 Dec 2008 20:35:27 -0800 (PST)
On Dec 28, 1:21 pm, Jerry Stuckle <jstuck...@xxxxxxxxxxxxx> wrote:
axrock wrote:
On Dec 28, 11:31 am, Jerry Stuckle <jstuck...@xxxxxxxxxxxxx> wrote:
axrock wrote:
Hello,You need a lot more than a regex for this. A regex will just compare
I am not to good with regular expressions and was hoping somebody
could provide me some help to create a function to do the following....
I want a regular expression to compare sentences and then rate them as
a percentage.
IE:
A user types: "Hello, how are you today"
I have an array (or mysql data) with a list of other phrases like so...
1. How are you today
2. Hi, how are you today
3. What is the time
4. What are you doing today
5. Hello how are you today
5. etc
I need the regular expression to compare the string entered by the
user and then return a list from the array rating them as a percentage
of the closest match.
IE: I would assume in this case that number 5 would rate the highest
(100% match) followed by number 2 then number 1.
I don't need to worry about punctuation like commas or quotes etc.
These will be stripped from the input first.
If anybody knows a way to do this it would be greatly appreciated.
Is a regular expression the way to go, or should I be investigating
some other method?
Many thanks!
characters. You need to compare multiple words, for which you will need
a lexicographic comparison package.
Thanks for the input.
I have looked this up on google and still don't really understand what
that even means (Lexicographic).
Still on square 1 i guess.
Sorry, I just assumed your native language was English, because it's so
good. But I know that's not necessarily the case :-)
A lexicographic comparison will parse and analyze a sentence. You can
use it with two different sentences to make comparison based on the
words, their order, etc. Such comparison packages are cpu intensive
complex and generally implemented in a compiled language due to their
complexity. They also are generally not found alone, but together with
other products (i.e. databases, CRM systems, etc.).
A regex won't actually compare anything, but you can use it to strip
punctuation marks, etc.
However, from there on, you still have major problems. Comparisons in
PHP are on a character-by-character basis; slight differences in
characters will throw things off. For instance, the two statements:
"In an hour the system will go down for maintenance".
"In a hour the system will go down for maintenance".
Now obviously the first statement is correct ("an" is correct here, "a"
is not). But the second is a common mistake. A lexicographical
comparison will show a very high relationship between the two
statements, but a straight character comparison will say everything
after the first 4 characters ("In a") is different.
Sure, you can take care of this problem. But then you get into things
like "radiuses" instead of the correct "radii". And many other things.
So you parse into words and compare them. Now what happens if I say "In
sixty minutes the system will go down for maintenance"? Basically
identical lexicographically, but different in the words.
And this is just the tip of the iceberg. Even correctly worded
statements can be virtually identical lexicographically but very
different when performing character comparisons.
It's why these packages are so complex.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstuck...@xxxxxxxxxxxxx
==================
That was a very good explanation. I am English by the way, however I
had never heard the term before, and online dictionaries didn't really
explain it that well.
What I need does not really need to be so accurate. A good example is
(lets say a search engine) will search for comparisons and list them
in the order of the most accurate match etc.
IE: If I searched for "Hello how are you" and there was a value in an
array (or mysql db) that matched it to the point of "how are you
today" and nothing else was even close, then this would have the
highest rating, but not quite 100%.
This is really what I am after.
Whatever the user inputs I want it to search from a variation of array
values, or database entries and then output them in a rated order from
best to worst. This is common on websites where the developer has
created their own search feature. Google is the exception because
their algorithm is very complex and looks for much more than just the
words itself. But hopefully that explains what I am after slightly
better.
Thanks
.
- Follow-Ups:
- Re: Regular Expression Function
- From: Norman Peelman
- Re: Regular Expression Function
- References:
- Regular Expression Function
- From: axrock
- Re: Regular Expression Function
- From: Jerry Stuckle
- Re: Regular Expression Function
- From: axrock
- Re: Regular Expression Function
- From: Jerry Stuckle
- Regular Expression Function
- Prev by Date: Re: Regular Expression Function
- Next by Date: Re: Regular Expression Function
- Previous by thread: Re: Regular Expression Function
- Next by thread: Re: Regular Expression Function
- Index(es):
Relevant Pages
|