Re: Tcl faster than Perl/Python...but only with tricks...
- From: George Petasis <petasis@xxxxxxxxxxxxxxxxx>
- Date: Sat, 30 Dec 2006 15:52:47 +0200
Dear Stefan,
Yes, there are tricks you can do to make things fast in any language, but you can apply simple things (like placing your code in a proc) that will make your code run faster.
I have tried some small variations by mimicking your first python code (as I don't know enough perl to understand the perl code). For me,
the tcl code that does similar to the python code is:
proc do {} {
set f [open bigfile r]
foreach { s } [split [read $f] \n] {
if { [regexp -nocase {destroy} $s] } {
puts $s
}
}
}
do
I have enclosed the code in a proc, and I am treating the file as a list, as python does :-)
The above code is faster than python. Perl is of course unbeatable and the second python code uses a special feature of python (filters on files?), which to me is equivalent to the trick you also did with "regexp -all". So, my ranking is: :-)
1st: "tricky" tcl (regexp -all on all data)
2nd: perl
3rd: "tricky" python (using filter)
4th: tcl (code in proc, split on [read $f], string match & regexp)
5th: python
6th: tcl with gets (in proc)
7th: tcl with gets (no proc)
It would be interesting to have a version of python that reads line by line (like gets in tcl) to compare (I don't know python, so perhaps I am asking something silly?). Also, it is unclear to me if perl (5.8.8) is working in unicode mode, or in 8-bit (in my fedora core 6, tcl uses utf-8, and I suppose python also supports unicode). Anyway, the numbers in my system are (measured with tcl time for 15 iterations):
perl perl.pl: 244442 microseconds per iteration
python python.py: 582543 microseconds per iteration
python python2.py: 527532 microseconds per iteration
tclsh tcl.tcl: 1020712 microseconds per iteration
tclsh tcl2.tcl: 816473 microseconds per iteration
tclsh tcl3.tcl: 550781 microseconds per iteration
tclsh tcl4.tcl: 568477 microseconds per iteration
========================================================
perl.pl (244442 microseconds per iteration)
========================================================
open(F, 'bigfile') or die;
while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
========================================================
python.py (582543 microseconds per iteration)
========================================================
import re
r = re.compile(r'destroy', re.IGNORECASE)
for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")
========================================================
python2.py (527532 microseconds per iteration)
========================================================
import re
r = re.compile(r'destroy', re.IGNORECASE)
def stripit(x):
return x.rstrip("\r\n")
print "\n".join( map(stripit, filter(r.search, file('bigfile'))) )
========================================================
tcl.tcl (1020712 microseconds per iteration)
========================================================
set f [open bigfile r]
while { [gets $f line] >= 0} {
if {[string match -nocase {*destroy*} $line]} {
puts $line
}
}
========================================================
tcl2.tcl (816473 microseconds per iteration)
========================================================
proc do {} {
set f [open bigfile r]
while { [gets $f line] >= 0} {
if {[string match -nocase {*destroy*} $line]} {
puts $line
}
}
}
do
========================================================
tcl3.tcl (550781 microseconds per iteration)
========================================================
proc do {} {
set f [open bigfile r]
foreach { s } [split [read $f] \n] {
if { [regexp -nocase {destroy} $s] } {
puts $s
}
}
}
do
========================================================
tcl4.tcl (568477 microseconds per iteration)
========================================================
proc do {} {
set f [open bigfile r]
foreach s [split [read $f] \n] {
if {[string match -nocase {*destroy*} $s]} {
puts $s
}
}
}
do
The code of the running script is:
set list {perl perl.pl
python python.py python python2.py
tclsh tcl.tcl tclsh tcl2.tcl
tclsh tcl3.tcl tclsh tcl4.tcl}
foreach {exe code} $list {
set time($code) [time [list exec $exe $code 15]]
puts "$exe $code: $time($code)"
}
foreach {exe code} $list {
puts "========================================================"
puts " $code ($time($code))"
puts "========================================================"
set f [open $code]; puts [string trim [read $f]]; close $f
}
Regards,
George
O/H Stephan Kuhagen έγραψε:
Hello.
Currently there is a thread in c.l.python
(http://groups.google.de/group/comp.lang.python/browse_thread/thread/923e34e8466ac920/233f1310151e19f6)
about if it is possible for Python to beat Perl in a small text matching
task. Have some patience, there comes a really fast Tcl solution at the
end, but I like to describe the other Versions of Perl/Python first and
then present Tcl and some questions about performance of some things in
Tcl.
The text to match was the case insensitive word "destroy" in a text from
gutenberg.org (King James Bible). The text used for the test was generated
this way:
$ wget http://www.gutenberg.org/files/7999/7999-h.zip
$ unzip 7999-h.zip
$ cd 7999-h
$ cat *.htm > bigfile
$ du -h bigfile
du -h bigfile
8.2M bigfile
The code there for Perl was:
---
open(F, 'bigfile') or die;
while(<F>) {
s/[\n\r]+$//;
print "$_\n" if m/destroy/oi;
}
---
Which really fast finds all lines containing "destroy" case insensitive and
prints them out. On my computer (Linux 2.6.18, 2.6 GHz Pentium 4) this took
0.273s for Perl (for all measurements I used the average of the last three
runs of four, throwing away the first for caching).
The Python-Version was ---
import re
r = re.compile(r'destroy', re.IGNORECASE)
for s in file('bigfile'):
if r.search(s): print s.rstrip("\r\n")
---
Also fast, I think: 0.622s. After some Iterations, the Pythonians came up
with this solution and faster 0.526s:
---
import re
r = re.compile(r'destroy', re.IGNORECASE)
def stripit(x):
return x.rstrip("\r\n")
print "\n".join( map(stripit, filter(r.search, file('bigfile'))) )
---
I asked myself, how this would perform in Tcl, so I first did the straight
forward version, which resembles the other versions:
---
set f [open bigfile r]
while { [gets $f line] >= 0} {
if {[string match -nocase "*destroy*" $line]} {
puts $line
}
}
---
0.937s Ouch... (Tcl 8.4.13, with 8.5a4 I got even worse 1.2s)
I asked myself, what make Tcl so damn slow here. I commented out the
if...puts...-part what made the thing twice as fast (and useless of
course...). But that shows, that matching only took half of the time, which
surprised me. I thought, reading the file and running through the
while-loop should take nearly no time...
So my question is, why are [gets] and or [while] so slow, and is there a
change to improve that? For text processing these are two very central
commands...
I think about all the usenet-threads and preconceptions about Tcls slowness
(just have a look at the current thread in c.l.tcl: "Is Tcl work for large
programs?"). Tcl CAN do really fast, but you need some tricks and
knowledge, which is far from being obvious... After some thinking, I came
up with this:
---
set f [open bigfile r]
puts [join [regexp -all -inline -linestop -nocase {.*destroy.*\n} [read $f]]
{}]
---
0.223s (8.5a4: 0.241) Wow! Faster than Perl and at least as unreadable as
Perl, the Perl-Guys would love it! ;-)
But I don't. It doesn't look good, and it uses an unfair trick by reading
the whole file into memory. But that does not work, if the file is too
large for memory, while this would be no problem for the
Perl/Python-Versions. The only good thing about this version is, that it
shows, that Tcl regexp are nearly as fast as Perls, whcih is really good, I
think.
So, I could beat Perls/Pythons performance with Tcl, but it does not really
make me happy...
Regards
Stephan
- Follow-Ups:
- Re: Tcl faster than Perl/Python...but only with tricks...
- From: Stephan Kuhagen
- Re: Tcl faster than Perl/Python...but only with tricks...
- From: George Petasis
- Re: Tcl faster than Perl/Python...but only with tricks...
- References:
- Tcl faster than Perl/Python...but only with tricks...
- From: Stephan Kuhagen
- Tcl faster than Perl/Python...but only with tricks...
- Prev by Date: Re: Tcl faster than Perl/Python...but only with tricks...
- Next by Date: Re: Tcl faster than Perl/Python...but only with tricks...
- Previous by thread: Re: Tcl faster than Perl/Python...but only with tricks...
- Next by thread: Re: Tcl faster than Perl/Python...but only with tricks...
- Index(es):
Relevant Pages
|