Re: Program inefficiency?



On Sep 29, 5:22 pm, hall.j...@xxxxxxxxx wrote:
I wrote the following simple program to loop through our help files
and fix some errors (in case you can't see the subtle RE search that's
happening, we're replacing spaces in bookmarks with _'s)

the program works great except for one thing. It's significantly
slower through the later files in the search then through the early
ones... Before anyone criticizes, I recognize that that middle section
could be simplified with a for loop... I just haven't cleaned it
up...

The problem is that the first 300 files take about 10-15 seconds and
the last 300 take about 2 minutes... If we do more than about 1500
files in one run, it just hangs up and never finishes...

Is there a solution here that I'm missing? What am I doing that is so
inefficient?

Ugh, that was entirely too many regexps for my taste :-)

How about something like:

def attr_ndx_iter(txt, attribute):
"Return all the start and end indices for the values of
attribute."
txt = txt.lower()
attribute = attribute.lower() + '='
alen = len(attribute)
chunks = txt.split(attribute)
if len(chunks) == 1:
return

start = len(chunks[0]) + alen
end = -1

for chunk in chunks[1:]:
qchar = chunk[0]
end = start + chunk.index(qchar, 1)
yield start + 1, end
start += len(chunk) + alen

def substr_map(txt, indices, fn):
"Apply fn to text within indices."
res = []
cur = 0

for i,j in indices:
res.append(txt[cur:i])
res.append(fn(txt[i:j]))
cur = j

res.append(txt[cur:])
return ''.join(res)

def transform(s):
"The transformation to do on the attribute values."
return s.replace(' ', '_')

def zap_spaces(txt, *attributes):
for attr in attributes:
txt = substr_map(txt, attr_ndx_iter(txt, attr), transform)
return txt

def mass_replace():
import sys
w = sys.stdout.write

for f in open(r'pathname\editfile.txt'):
try:
open(f, 'w').write(zap_spaces(open(f).read(), 'href',
'name'))
w('.') # progress-meter :-)
except:
print 'Error processing file:', f

minimally-tested'ly y'rs
-- bjorn

.



Relevant Pages

  • Re: Hash map with multiple keys per value ?
    ... i do feel your pain about representing the alias relationships twice - it feels wrong. ...
    (comp.lang.python)
  • sqlite3 gui
    ... This work with sqlite3 Python Version: ... This is for basic database access to view the information. ... for t in cur: ... def createWidgets: ...
    (comp.lang.python)
  • Re: Program inefficiency?
    ... we're replacing spaces in bookmarks with _'s) ... def substr_map: ... I have no doubt that it would be possible to do with a single regex. ...
    (comp.lang.python)
  • Neat access to db query results
    ... willing to forego some efficiency. ... cur = db.cursor ... cur.execute('select abc, def from blah') ...
    (comp.lang.python)
  • Re: Program inefficiency?
    ... and fix some errors (in case you can't see the subtle RE search that's ... we're replacing spaces in bookmarks with _'s) ... def attr_ndx_iter: ...
    (comp.lang.python)