Speed ain't bad

From: Bulba! (bulba_at_bulba.com)
Date: 12/31/04


Date: Fri, 31 Dec 2004 01:41:13 +0100


One of the posters inspired me to do profiling on my newbie script
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.

This is the experiment: a script recreates the folder hierarchy
somewhere else and stores there the compressed versions of
files from source hierarchy (the script is doing additional backups
of the disk of file server at the company where I work onto other
disks, with compression for sake of saving space). The data was:

468 MB, 15057 files, 1568 folders
(machine: win2k, python v2.3.3)

The time that WinRAR v3.20 (with ZIP format and normal compression
set) needed to compress all that was 119 seconds.

The Python script time (running under profiler) was, drumroll...

198 seconds.

Note that the Python script had to laboriously recreate the tree of
1568 folders and create over 15 thousand compressed files, so
it had more work to do actually than WinRAR did. The size of
compressed data was basically the same, about 207 MB.

I find it very encouraging that in the real world area of application
a newbie script written in the very high-level language can have the
performance that is not that far from the performance of "shrinkwrap"
pro archiver (WinRAR is excellent archiver, both when it comes to
compression as well as speed). I do realize that this is mainly
the result of all the "underlying infrastructure" of Python. Great
work, guys. Congrats.

The only thing I'm missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I'm just curious what possible solutions could be).

Any takers among the experienced guys?

Profiling results:

>>> p3.sort_stats('cumulative').print_stats(40)
Fri Dec 31 01:04:14 2004 p3.tmp

         580543 function calls (568607 primitive calls) in 198.124 CPU
seconds

   Ordered by: cumulative time
   List reduced from 69 to 40 due to restriction <40>

   ncalls tottime percall cumtime percall
filename:lineno(function)
        1 0.013 0.013 198.124 198.124 profile:0(z3())
        1 0.000 0.000 198.110 198.110 <string>:1(?)
        1 0.000 0.000 198.110 198.110 <interactive
input>:1(z3)
        1 1.513 1.513 198.110 198.110 zmtree3.py:26(zmtree)
    15057 14.504 0.001 186.961 0.012 zmtree3.py:7(zf)
    15057 147.582 0.010 148.778 0.010
C:\Python23\lib\zipfile.py:388(write)
    15057 12.156 0.001 12.156 0.001
C:\Python23\lib\zipfile.py:182(__init__)
    32002 7.957 0.000 8.542 0.000
C:\PYTHON23\Lib\ntpath.py:266(isdir)
13826/1890 2.550 0.000 8.143 0.004
C:\Python23\lib\os.py:206(walk)
    30114 3.164 0.000 3.164 0.000
C:\Python23\lib\zipfile.py:483(close)
    60228 1.753 0.000 2.149 0.000
C:\PYTHON23\Lib\ntpath.py:157(split)
    45171 0.538 0.000 2.116 0.000
C:\PYTHON23\Lib\ntpath.py:197(basename)
    15057 1.285 0.000 1.917 0.000
C:\PYTHON23\Lib\ntpath.py:467(abspath)
    33890 0.688 0.000 1.419 0.000
C:\PYTHON23\Lib\ntpath.py:58(join)
   109175 0.783 0.000 0.783 0.000
C:\PYTHON23\Lib\ntpath.py:115(splitdrive)
    15057 0.196 0.000 0.768 0.000
C:\PYTHON23\Lib\ntpath.py:204(dirname)
    33890 0.433 0.000 0.731 0.000
C:\PYTHON23\Lib\ntpath.py:50(isabs)
    15057 0.544 0.000 0.632 0.000
C:\PYTHON23\Lib\ntpath.py:438(normpath)
    32002 0.431 0.000 0.585 0.000
C:\PYTHON23\Lib\stat.py:45(S_ISDIR)
    15057 0.555 0.000 0.555 0.000
C:\Python23\lib\zipfile.py:149(FileHeader)
    15057 0.483 0.000 0.483 0.000
C:\Python23\lib\zipfile.py:116(__init__)
      151 0.002 0.000 0.435 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:171(write)
      151 0.002 0.000 0.432 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:489(write)
      151 0.013 0.000 0.430 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:461(HandleOutput)
       76 0.087 0.001 0.405 0.005
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:430(QueueFlush)
    15057 0.239 0.000 0.340 0.000
C:\Python23\lib\zipfile.py:479(__del__)
    15057 0.157 0.000 0.157 0.000
C:\Python23\lib\zipfile.py:371(_writecheck)
    32002 0.154 0.000 0.154 0.000
C:\PYTHON23\Lib\stat.py:29(S_IFMT)
       76 0.007 0.000 0.146 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:262(dowrite)
       76 0.007 0.000 0.137 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\formatter.py:221(OnStyleNeeded)
       76 0.011 0.000 0.118 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:197(Colorize)
       76 0.110 0.001 0.112 0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:69(SCIInsertText)
       76 0.079 0.001 0.081 0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:333(GetTextRange)
       76 0.018 0.000 0.020 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:296(SetSel)
       76 0.006 0.000 0.018 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\document.py:149(__call__)
      227 0.003 0.000 0.012 0.000
C:\Python23\lib\Queue.py:172(get_nowait)
       76 0.007 0.000 0.011 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:114(ColorizeInteractiveCode)
      532 0.011 0.000 0.011 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:330(GetTextLength)
       76 0.001 0.000 0.010 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\view.py:256(OnBraceMatch)
     1888 0.009 0.000 0.009 0.000
C:\PYTHON23\Lib\ntpath.py:245(islink)

---
Script:
#!/usr/bin/python
import os
import sys
from zipfile import ZipFile, ZIP_DEFLATED
def zf(sfpath, targetdir):
    if (sys.platform[:3] == 'win'):
        tgfpath=sfpath[2:]
    else:
        tgfpath=sfpath
    zfdir=os.path.dirname(os.path.abspath(targetdir) + tgfpath)
    zfpath=zfdir + os.path.sep + os.path.basename(tgfpath) + '.zip'
    if(not os.path.isdir(zfdir)):
        os.makedirs(zfdir)
    archive=ZipFile(zfpath, 'w', ZIP_DEFLATED)
    sfile=open(sfpath,'rb')
    zfname=os.path.basename(tgfpath)
    archive.write(sfpath, os.path.basename(zfpath), ZIP_DEFLATED)
    archive.close()
    ssize=os.stat(sfpath).st_size
    zsize=os.stat(zfpath).st_size
    return (ssize,zsize)
def zmtree(sdir,tdir):
    n=0
    ssize=0
    zsize=0
    sys.stdout.write('\n ')
    for root, dirs, files in os.walk(sdir):
        for file in files:
            res=zf(os.path.join(root,file),tdir)
            ssize+=res[0]
            zsize+=res[1]
            n=n+1
            #sys.stdout.write('.')
            if (n % 200 == 0):
                print "  %.2fM (%.2fM)" % (ssize/1048576.0,
zsize/1048576.0)
                #sys.stdout.write(' ')
    return (n, ssize, zsize)
                
if __name__=="__main__":
    if len(sys.argv) == 3:
        if(os.path.isdir(sys.argv[1]) and os.path.isdir(sys.argv[2])):
(n,ssize,zsize)=zmtree(os.path.abspath(sys.argv[1]),os.path.abspath(sys.argv[2]))
            print "\n\n Summary:\n  Number of files compressed: %d\n
Total size of original files: %.2fM\n  \
Total size of compressed files: %.2fM" % (n, ssize/1048576.0,
zsize/1048576.0)
            sys.exit(0)
        else:
            print "Incorrect arguments."
            if (not os.path.isdir(sys.argv[1])): print sys.argv[1] + "
is not directory."
            if (not os.path.isdir(sys.argv[2])): print sys.argv[2] + "
is not directory."
print "\n Usage:\n " + sys.argv[0] + " source-directory
target-directory"
--
It's a man's life in a Python Programming Association.


Relevant Pages

  • Re: perl to python
    ... sed and perl can let you do all that quick command line stuff. ... into a full blown script. ... scripts, call it from the python interpreter, whatever I need. ... If I stick to the traditional unix approach, ...
    (comp.lang.python)
  • Re: Learning Tkinter
    ... and how the command option is used to call the function callback. ... gui programming to see if the python programs I have written can be made ... search their computer for this file, execute the python code and then ... This is the meaning of the test on __name__: this magical variable is set to the string '__main__' if and only if the current script is the top-most one, i.e the one you ran python on. ...
    (comp.lang.python)
  • Correct way to handle independent interpreters when embedding in a single-threaded C++ app
    ... The Python interpreter is ... interpreter and provides an extension module to expose ... manage each script runs in a new interpreter. ... globals that could cause issues. ...
    (comp.lang.python)
  • Re: bash vs. python scripts - which one is better?
    ... Quick, take your one liner, have it traverse an entire directory tree ... For me I just need to change my small script into a function, ... That is where shell falls down. ... alternatives like Python, Perl or Ruby laying around to be used. ...
    (Debian-User)
  • Re: Correct way to handle independent interpreters when embedding in a single-threaded C++ app
    ... Since you are not running any python scripts or calling any python ... This will also ensure that execution of one script wont ... the script and thus shut down the interpreter. ...
    (comp.lang.python)