Re: Regular Expressions...



"Ken D'Ambrosio" <ken@xxxxxxxx> writes:

Hi, all. As a recovering Perl guy, I have to admit I don't quite "get"
the re module. For example, I'd like to do a few things (I'm going to use
phone numbers, 'cause that's what I'm currently dealing with):
12345678900 -- How would I:
- Get just the area code?
- Get just the seven-digit number?

In Perl, I'd so something like
m/^1(...)(.......)/;

Wouldn't that be better as:

m/^1(\d{3})(\d{7})$/;

I'll assume that more-precise expression in what follows.

and then I'd have the numbers in $1 and $2, respectively. But the Python
stuff simply isn't clicking for me.

In general, where a set of data is likely to be iterated, the Pythonic
way to present it is via a single iterable (instead of, in your Perl
example, separate variables).

Then, for those (generally less frequent) cases where you do want the
separate items, you can bind them in a single statement:

(foo, bar, baz) = some_sequence

or

(foo, bar, baz) = (item for item in some_sequence)

e.g.:

>>> (foo, bar, baz) = [1, 2, 3]
>>> foo
1
>>> bar
2
>>> baz
3

So, the match returned by the various ‘re’ module match functions is
an object which allows access to the grouped matches as a sequence.

If anyone could supply concrete examples of how to do the problem,
above, that would be terrific.

Assuming the following:

>>> import re
>>> phone_number_regex = '^1(\d{3})(\d{7})$'

Trivial one-shot example:

>>> phone_number = '12345678900'
>>> (area_code, local_number) = re.match(phone_number_regex, phone_number).groups()
>>> area_code
'234'
>>> local_number
'5678900'

More explicit example, showing the various steps and assuming you want
to re-use the various values in multiple statements:

>>> phone_number_pattern = re.compile(phone_number_regex)
>>> phone_number_pattern
<_sre.SRE_Pattern object at 0xf7f8c598>

>>> phone_number = '12345678900'
>>> phone_number_match = phone_number_pattern.match(phone_number)
>>> phone_number_match
<_sre.SRE_Match object at 0xf7f52338>

>>> (area_code, local_number) = phone_number_match.groups()
>>> area_code
'234'
>>> local_number
'5678900'

Python regular expressions also allow naming each group, for later
access to the matches via a dict:

>>> phone_number_regex = '^1(?P<area_code>\d{3})(?P<local_number>\d{7})'
>>> phone_number_pattern = re.compile(phone_number_regex)
>>> phone_number_match = phone_number_pattern.match(phone_number)
>>> phone_number_groups = phone_number_match.groupdict()
>>> phone_number_groups['area_code']
'234'
>>> phone_number_groups['local_number']
'5678900'

--
\ “… one of the main causes of the fall of the Roman Empire was |
`\ that, lacking zero, they had no way to indicate successful |
_o__) termination of their C programs.” —Robert Firth |
Ben Finney
.



Relevant Pages

  • Re: OT: Why is C so popular?
    ... > about the indent program at the time? ... if foo: ...
    (Debian-User)
  • Re: how to deserialize variable element/node
    ... string bar; ... baz[] Baz; ... Your two XML fragments would have to be represented in an XML schema by ...
    (microsoft.public.dotnet.xml)
  • Re: macros
    ... (:method ((foo foo) ... (baz baz)) ... in Anonymous C Lisper's post (bar bar)" ... latter so you know that this is actually the generic function you want). ...
    (comp.lang.lisp)
  • Re: creating several rows with one insert?
    ... Is there some clever way to cause this to happen in SQL, ... SQL> create table t1as select 'foo' from dual ... union all select 'bar' from dual union all select 'baz' from dual; ...
    (comp.databases.oracle.misc)
  • Re: Insert with response
    ... FooBar, there's no way and no need to put them in synch. ... column in the foo table to 250 calumns in the bar table. ... set statistics time off ...
    (microsoft.public.sqlserver.programming)

Loading