Re: Regular Expression and Useage



On Oct 30, 1:54 pm, "inderpau...@xxxxxxxxx" <inderpau...@xxxxxxxxx>
wrote:
I'm somewhat new to regular expression and want to know how to extract
any strings which match an IP address.

I found this on the net and wanted to know if this is the most
efficient (easiest/shortest) way to write the expression or
pattern to match.

Shouldn't you first be concerned about whether it's the most *correct*
before you worry about efficiency?

Also in the discovered solution why do they use the \b word
boundary switch since the characters are of a numeric type ?
I'm not sure about this.

A "word" character in Perl is any letter, number, or underscore.
Therefore, the \b prevents other numbers from being next to the IP
address. That is, it prevents 921128.0.0.123423 from matching. Of
course, it also prevents HOME128.0.0.1, which may or may not be what
you want.

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

It could be shortened to:
\b(?:\d{1,3}\.){3}\d{1,3}\b

Or you could/should use Regexp::Common from CPAN and just write:
/\b$RE{net}{IPv4}\b/

Which not only is more easily readable, but also prevents such false-
matches as 318.99.183.999. That is, it takes care of checking the
individual components' sizes for you.

Paul Lalli

.



Relevant Pages

  • Re: sort problem
    ... AS> # Set $order{$char} to increasing values in the sequence we want ... AS> single-byte characters is irrelevant. ... AS> they are strings, not integers. ... perl strings. ...
    (comp.lang.perl.misc)
  • Re: Why R6RS is controversial
    ... the semantics of the language, ... behavior of grapheme-cluster characters under most linguistic ... as the strings grow longer. ... Normalization is hideously complicated, and may require many ...
    (comp.lang.scheme)
  • Re: Unicode LISP??
    ... I'm not experienced with Common Lisp library, ... terms of strings rather than characters. ... have their representation upgraded if they are updated in place. ...
    (comp.lang.lisp)
  • Re: not quite 1252
    ... The kill_gremlins function is intended to fix Unicode strings that have been obtained by decoding 8-bit strings using 'latin1' instead of 'cp1252'. ... In fact it wasn't, it was UTF-8 like Sergei wrote, but it was easy to convert it to cp1252, no problem. ... characters to documents marked up as ISO 8859-1 or other encodings. ...
    (comp.lang.python)
  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)