Re: diff type operations in TCL



Donal K. Fellows wrote:
MH wrote:
Well.. That depends.. I guess doing a chunk in binary mode would be
reasonably fast. Does Tcl actually DO a memcmp on the results though, or
does it first convert them to some other representation in order to do a
binary comparison?

As for speed, well.. Doing Tcl_Gets on a channel (inside a C++ program) versus
fdreopen and fgets, for a 1 million line file adds a ~1.5 second penalty to
my parsing program.

I was thinking about handling the files as binary (so no character set
conversion, leading to the channel system managing its buffers using
memcpy instead of slower operations), using [read] (so fixed size chunks
which again encourage fast data handling), and using [string compare]
(which really does do memcmp; I've checked!) If the chunks are fairly
large (a few megabytes is reasonable on a modern machine) the overall
comparison should be quick. This is considerably at odds with what you
were proposing...

Indeed, what I'm proposing is exactly this:

proc filesEqual {file1 file2 {chunk 8388608}} {
# Test the "Duh!" case first
if {[file size $file1] != [file size $file2]} {
return 0
}

# Written to use 8.5a4 features...
set f1 [open $file1 rb]; set f2 [open $file2 rb]
# Otherwise, use [fconfigure $f1 -translation binary] here

while {![eof $f1]} {
# NB: the 'ne' operator is currently slower for this task
if {[string compare [read $f1 $chunk] [read $f2 $chunk]]} {
close $f1; close $f2
return 0
}
}
close $f1; close $f2
return 1
}
...

Donal,

Would you not want to also fconfigure f1 and f2 with -buffersize $chunk?

Or would this not help?

(I figured it would configure the I/O subsystem to attempt to do read aheads to keep the buffer full.)

--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
.