Re: text processing problem
- From: "Matt" <matthew_shomphe@xxxxxxxxxxxxxxx>
- Date: 7 Apr 2005 16:14:01 -0700
Maurice LING wrote:
> Hi,
>
> I'm looking for a way to do this: I need to scan a text (paragraph or
> so) and look for occurrences of "<text-x> (<text-x>)". That is, if
the
> text just before the open bracket is the same as the text in the
> brackets, then I have to delete the brackets, with the text in it.
>
> Does anyone knows any way to achieve this?
>
> The closest I've seen is
> (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/305306) by
> Raymond Hettinger
>
> >>> s = 'People of [planet], take us to your leader.'
> >>> d = dict(planet='Earth')
> >>> print convert_template(s) % d
> People of Earth, take us to your leader.
>
> >>> s = 'People of <planet>, take us to your leader.'
> >>> print convert_template(s, '<', '>') % d
> People of Earth, take us to your leader.
>
> """
>
> import re
>
> def convert_template(template, opener='[', closer=']'):
> opener = re.escape(opener)
> closer = re.escape(closer)
> pattern = re.compile(opener + '([_A-Za-z][_A-Za-z0-9]*)' +
closer)
> return re.sub(pattern, r'%(\1)s', template.replace('%','%%'))
>
> Cheers
> Maurice
Try this:
import re
my_expr = re.compile(r'(\w+) (\(\1\))')
s = "this is (is) a test"
print my_expr.sub(r'\1', s)
#prints 'this is a test'
M@
.
- Follow-Ups:
- Re: text processing problem
- From: Maurice LING
- Re: text processing problem
- References:
- text processing problem
- From: Maurice LING
- text processing problem
- Prev by Date: Re: within a class, redefining self with pickled file
- Next by Date: Re: curious problem with large numbers
- Previous by thread: text processing problem
- Next by thread: Re: text processing problem
- Index(es):