Re: Regular Expressions...
- From: Ben Finney <bignose+hates-spam@xxxxxxxxxxxxxxx>
- Date: Thu, 08 Jan 2009 12:41:14 +1100
"Ken D'Ambrosio" <ken@xxxxxxxx> writes:
Hi, all. As a recovering Perl guy, I have to admit I don't quite "get"
the re module. For example, I'd like to do a few things (I'm going to use
phone numbers, 'cause that's what I'm currently dealing with):
12345678900 -- How would I:
- Get just the area code?
- Get just the seven-digit number?
In Perl, I'd so something like
m/^1(...)(.......)/;
Wouldn't that be better as:
m/^1(\d{3})(\d{7})$/;
I'll assume that more-precise expression in what follows.
and then I'd have the numbers in $1 and $2, respectively. But the Python
stuff simply isn't clicking for me.
In general, where a set of data is likely to be iterated, the Pythonic
way to present it is via a single iterable (instead of, in your Perl
example, separate variables).
Then, for those (generally less frequent) cases where you do want the
separate items, you can bind them in a single statement:
(foo, bar, baz) = some_sequence
or
(foo, bar, baz) = (item for item in some_sequence)
e.g.:
>>> (foo, bar, baz) = [1, 2, 3]
>>> foo
1
>>> bar
2
>>> baz
3
So, the match returned by the various ‘re’ module match functions is
an object which allows access to the grouped matches as a sequence.
If anyone could supply concrete examples of how to do the problem,
above, that would be terrific.
Assuming the following:
>>> import re
>>> phone_number_regex = '^1(\d{3})(\d{7})$'
Trivial one-shot example:
>>> phone_number = '12345678900'
>>> (area_code, local_number) = re.match(phone_number_regex, phone_number).groups()
>>> area_code
'234'
>>> local_number
'5678900'
More explicit example, showing the various steps and assuming you want
to re-use the various values in multiple statements:
>>> phone_number_pattern = re.compile(phone_number_regex)
>>> phone_number_pattern
<_sre.SRE_Pattern object at 0xf7f8c598>
>>> phone_number = '12345678900'
>>> phone_number_match = phone_number_pattern.match(phone_number)
>>> phone_number_match
<_sre.SRE_Match object at 0xf7f52338>
>>> (area_code, local_number) = phone_number_match.groups()
>>> area_code
'234'
>>> local_number
'5678900'
Python regular expressions also allow naming each group, for later
access to the matches via a dict:
>>> phone_number_regex = '^1(?P<area_code>\d{3})(?P<local_number>\d{7})'
>>> phone_number_pattern = re.compile(phone_number_regex)
>>> phone_number_match = phone_number_pattern.match(phone_number)
>>> phone_number_groups = phone_number_match.groupdict()
>>> phone_number_groups['area_code']
'234'
>>> phone_number_groups['local_number']
'5678900'
--
\ “… one of the main causes of the fall of the Roman Empire was |
`\ that, lacking zero, they had no way to indicate successful |
_o__) termination of their C programs.” —Robert Firth |
Ben Finney
.
- References:
- Regular Expressions...
- From: Ken D'Ambrosio
- Regular Expressions...
- Prev by Date: Re: Detecting a GUI session
- Next by Date: Re: Regular Expressions...
- Previous by thread: Regular Expressions...
- Next by thread: Re: Regular Expressions...
- Index(es):
Relevant Pages
|
Loading