Re: Binary operations in Tcl [Re: set character at string position]
- From: Melissa Schrumpf <m_schrumpf_at_yahoo_com_NOT@xxxxxxxxxxxxx>
- Date: Sat, 07 Jan 2006 22:17:17 -0500
Uwe Klein wrote:
> Melissa Schrumpf wrote:
> > Using C, though often uglier, rids one of some overhead. For example,
> > I'm processing a file with 202407416 bytes of data in 1114 chunks of,
> > on average, 1.7Mb each. All I want to do is convert 16-bit data from
> > big-endian to little endian.
> you are loosing sight of the glue aspect of tcl
Heh, I never thought I'd be accused of that. I string together programs
using Tcl all the time. While I do write complete applications in it, I
often view it as "just damn easier to work with than `sh`" :-)
> exec dd if=srcfile.dat conv=swab of=destfile.dat
> or some derivative of this.
Well, yes, if it weren't for the fact that the files I'm working with
are a horrendous mess. Basically, the program I use for digitizing
audio writes Quicktime files. The Quicktime file format is, in some
respects, not unlike a binary XML scheme, so I first have to parse out
information about where and how the data is stored. Tcl works wonders
for this. Then, there's the ugly bit that the application that creates
these files often pads huge segments of the files with zeroes
(seriously, one file was twice as long as it needed to be, because it
was overly zero-padded.)
Yeah, I could parse it first, then `dd` it, but that's no more portable
than writing my own extraction code. And, in a brief test, dd is still
about 40% slower).
> nobody has been shot yet for using system utilities ;-?
True enough. I guess I was just looking for a "what are the secrets
that aren't really documented but if you know enough about how Tcl is
coded, you can leverage it" sort of answer.
See, there's a huge difference between what works, and what works
efficiently, and, in ANY language, that difference is rarely as well
documented as it might be.
My initial implementation (proof-of-concept) was read one byte into b0,
read one byte into b1, write b1 to output file, write b0 to output file.
Now, YOU know this is a big waste of time, and I know that this is a big
waste of time, but this is the sort of thing that a lot of new
programmers might not intrinsically comprehend. Hell, it's the sort of
thing that a lot of "seasoned" programmers might not expect.
But where does one learn this? Where does one learn about the overhead
of system calls and I/O operations, things like memory-aligned data, and
cache hits, and the fact that most modern processors read data from
memory more than one byte at a time? They don't learn it by taking
courses that focus on high-level API's, or by reading C references.
Sometimes it's something a grizzled coder tells you. Sometimes it comes
up in a code inspection. Sometimes you get it by reading the processor
spec, or the OS code.
20 time trials on three different implementations:
A:
for (i=0; i<nb; i+=2) {*(buf1 + i) = *(buf0 + i + 1);}
for (i=0; i<nb; i+=2) {*(buf1 + i + 1) = *(buf0 + i);}
B:
for (i=0; i<nb; i+=2) {
*(buf1 + i) = *(buf0 + i + 1);
*(buf1 + i + 1) = *(buf0 + i);
}
C:
memcpy((buf1+1), buf0, (nb - 1));
for (i=0; i<nb; i+=2) {*(buf1 + i) = *(buf0 + i + 1);}
A: 10048325.05 microseconds per iteration
B: 8633765.6 microseconds per iteration
C: 7697434.1 microseconds per iteration
I fully expected this. But it certainly isn't obvious -- not from the
language and documentation alone. Method C copies 50% more data than
either of the other two methods, yet it offers a significant savings.
I'm just a lowly "user" of the Tcl platform. I don't know the ins and
outs of its design as well as I do other domains. It was news to me,
until a year or two ago, that using braces in [expr] would speed up
calculations:
% time {expr $a+$b} 50000
22.2757 microseconds per iteration
% time {expr {$a+$b}} 50000
2.21892 microseconds per iteration
Admittedly, this is actually documented, so it's entirely my fault that
I didn't know it.
Since we have all the experts here, I was hoping for some similar voodoo
insight into Tcl optimization, esp. with respect to the [binary] command.
Perhaps there an optimal size to tune byte strings to for [binary]? Is
scanning and formatting using s* and S* the fastest way, or is there
some other trick? Is there some way that does not involve transforming
the data first to a list, then back to binary again? Should I simply
have opted for using [string index] and [append] to reverse the bytes?
(I doubt it, on the last one, because it seems it would involve an awful
lot of copying.)
--
MKS
.
- Follow-Ups:
- References:
- Re: set character at string position
- From: Lisa Pearlson
- Re: set character at string position
- From: Uwe Klein
- Re: set character at string position
- From: Lisa Pearlson
- Re: set character at string position
- From: Cameron Laird
- Re: set character at string position
- From: Lisa Pearlson
- Re: set character at string position
- From: Melissa Schrumpf
- Re: set character at string position
- From: Earl Grieda
- Binary operations in Tcl [Re: set character at string position]
- From: Melissa Schrumpf
- Re: Binary operations in Tcl [Re: set character at string position]
- From: Uwe Klein
- Re: set character at string position
- Prev by Date: Re: Bug?
- Next by Date: Re: Bug?
- Previous by thread: Re: Binary operations in Tcl [Re: set character at string position]
- Next by thread: Re: Binary operations in Tcl [Re: set character at string position]
- Index(es):
Relevant Pages
|