Re: Regex: Why is overreaching necessary?
- From: "Shannon Jacobs" <Shannon.Jacobs.nospam@xxxxxxxxx>
- Date: 19 Feb 2007 15:33:27 -0800
On Feb 20, 3:07 am, anno4...@xxxxxxxxxxxxxxxxxxxxxx wrote:
Shannon Jacobs <Shannon.Jacobs.nos...@xxxxxxxxx> wrote in comp.lang.perl.misc:
On Feb 17, 10:58 am, anno4...@xxxxxxxxxxxxxxxxxxxxxx wrote:
Shannon Jacobs <sha...@xxxxxxxxxxxx> wrote in comp.lang.perl.misc:
[...]
I currently have .* in my first version above) so that it only considers 4
digits at a time. Here is some sample data from the file.
The Brethren 20010210282239 Fa
Gorilla, My Love 19810211042240 HF
KeitaiDenwaNoHimitsu 200102110722412242 JaChCS
Harry Potter and the Philosopher's Stone199702111722362243 Fa
In this example the first and fourth lines are proper matches against 2239
and 2243, respectively, but the third line is an undesired match against
1224. The problem as I see it is that the two things I'm thinking about
inserting should communicate with each other so that they always consume a
total of 8 characters, thereby forcing the target to consider only four
characters at a time.
Try this variant:
@foo2 = grep substr( $_, 50, 12 ) =~
/^(?:\d{4}){0,2}$form_values{'a_SEARCH_VALUE'}/,
@foo1;
Essentially that ties the pattern to the beginning of the substring,
then allows zero to two groups of four digits before a match.
Anno
Sorry, but that doesn't work. I think it's because it picks up the
false matches when it has no groups of four digits before the match.
It doesn't pick up false matches from the sample you supplied.
Somehow it needs to be limited to considering only four source digits
at a time, or to think that there is a non-digit boundary between the
two groups of four digits.
(I don't think it matters, and I tested it both ways, but I think it
should be
@foo2 = grep substr( $_, 50, 12 ) =~
/^(?:.{4}){0,2}$form_values{'a_SEARCH_VALUE'}/,
@foo1;
rather than your version. The data file may have spaces,
Then your sample data should have included such a case.
and I think
that \d wouldn't count them at that point.)
You are more permissive than the data requires. If you want to allow
blanks, allow blanks:
/^(?:[\d ]{4}){0,2}$form_values{'a_SEARCH_VALUE'}/
Anno
You are correct, but the problem is apparently in the particular data
sample which I provided. When tested against the full data file it
still has the problem of the false matches. I was in a hurry to
acknowledge my error, but I don't have time this morning to do more
diagnostics.
Perhaps it is something about the presence of the third number in some
of the real data that is causing it to fail? I see that the sample I
included did not have any cases with 12 digits, but only 8.
(I did test Ilya Zakharevich's proposed suggestion in the next post,
and it worked more poorly, producing additional false matches. I'm
eager to study the differences there, though his approach seems more
complicated than yours.)
.
- Follow-Ups:
- Re: Regex: Why is overreaching necessary?
- From: Ilya Zakharevich
- Re: Regex: Why is overreaching necessary?
- References:
- Regex: Why is overreaching necessary?
- From: Shannon Jacobs
- Re: Regex: Why is overreaching necessary?
- From: Shannon Jacobs
- Re: Regex: Why is overreaching necessary?
- From: anno4000
- Re: Regex: Why is overreaching necessary?
- From: Shannon Jacobs
- Re: Regex: Why is overreaching necessary?
- From: anno4000
- Regex: Why is overreaching necessary?
- Prev by Date: Re: fcntl call to check if a file is open - help needed
- Next by Date: Re: Regex confusion
- Previous by thread: Re: Regex: Why is overreaching necessary?
- Next by thread: Re: Regex: Why is overreaching necessary?
- Index(es):
Relevant Pages
|