Re: Best Job Skill --> .NET or Java



From: Richard Heathfield <r...@xxxxxxxxxxxxxxx>
Perhaps slightly more constructively, bear in mind that the most
important skills to acquire are not in specific languages (helpful
though that undoubtedly is) but in robust, modular, high-performance
programming.

Well I have two out of three there. I don't generally bum the code
to make it "high performance". I use data structures such as hash
tables and binary search trees etc. as appropriate to make the
algorithms reasonbly fast, and in general if a program is too slow
it profits more from a new data structure using a new algorithm
rather than bumming the code to be "high performance". For example,
many years ago I wrote a function called "segmat", which finds the
longest sub-sequence in common between two given strings, then
recurses to find any next-longest sub-sequence in common among the
parts not already matched. Finally it rearranges all the matching
parts into the correct sequence according to one of the two
strings, the one regarded as "correct". It was brute force, which
worked fine for the short strings I was dealing with (user is asked
an English sentence with one word missing, and has to fill it in,
and the program shows which parts are correct, deletes the
non-matching parts, and then the user edits that to try to get
closer to the correct answer). For another application where the
matching parts absolutely must be in the same sequence within both
given strings, I wrote a variant "segmatseq", which was of course
faster because after the first string was found it had to recurse
down only before-match-before and after-match-after, never
considering cross-matches between before of one string and after of
the other string. But the same basic brute-force algorithm was
used, which was usually fast enough, usually 5-10 seconds when
working with strings of several thousand bytes long, but
occasionally it'd take over a minute for strings several tens of
thousands of bytes long, sometimes a minute and a half, and I
needed to process ten thousand different pairs, so I rewrote it to
use a more efficient algorithm, which kept the runtime to less than
three seconds for most pairs, less than ten seconds for virtually
every pair of strings, only occasionally going as long as half a
minute so very rarely that it wasn't an issue, and I completed
processing all ten thousand pairs in about three days of non-busy
time on the Unix ISP. (The runtime was now dominated by the *other*
tasks that had to be performed to obtain the two big strings that
were then segmatched.)

If you're curious what that new algorithm was that was about an
order of magnitude faster than the original brute-force algorithm:
It compiled a histogram of trigrams, eliminated all the trigrams
that didn't occur exactly once in each of the two strings, and from
those exact 1-1 matching trigrams it expanded the match as widely
as possible, and deleted from the TO-DO list any exact 1-1 match
that was already covered by the expanded match just completed. When
it gets all done, it sorts the expanded matches to find the
largest, and then recurses on the non-match before and after
string-pairs.

But now I want to make it even faster! The basic idea is *not* to
generate *all* the trigrams and then tally histogram and eliminate
non-uniques. Rather just pick one (1) random trigram at a time and
brute-force check if that is unique in both strings. As soon as it
finds one random trigram that is unique, it expands that one,
eliminates from further consideration any trigram-starting-index
that overlaps that expanded sub-string, then picks a new random
trigram from among the parts not already exclused. Trouble is I'm
having trouble thinking of a good data structure to make an
efficient algorithm for *both* eliminating entire often-large
segments from further consideration in a single action *and* then
picking new random sample from what's left, hence the other thread
I started yesterday, which nobody has responded to yet. But I think
I have a couple good candidates, as I mentionned in that thread.

A solid knowledge of algorithms and data structures is
indispensable to a modern programmer,

I have that, for normal data/processing algorithms, plus a good
aptitude for deciding which of the standard
datastructures+algorithms is suitable for a given task, and when a
task could benefit from crafting a whole new
datastructure+algorithm.

as is an attitude of defensive programming (because, as everyone
ought to know, if it can go wrong it will, and if it can't go wrong
it will still find a way)

Yup. I have that skill too. You wouldn't believe the number of
if somethingNotRight then ERROR "Argument to function not ..."
in my code, or for inner functions that should never be called
except from their parent function, where my code should *guarantee*
the data is correct, I still have checks like:
if somethingNotRight then ERROR "Bug: Argument to function not ..."
and where I believe my algorithm guarantees (within a single
function where arguments are already known correct) that some
assertion further down will be correct, but if there's a bug it
would crash some low-level utility with an inscrutable error
message, so I protect myself by checking that assertion before
calling the low-level utility, to reduce my debugging time if I've
made a stupid mistake whereby my code doesn't do what I intended it
to do. It's so much easier to diagnose:
ERROR: Didn't find keyword at start of RFC822 header line
Called from (PARSE-SUBJECT-FIELD " (was: I really love Heather Thompson)")
(Obviously a RFC822 continuation line that somehow didn't get
concatenated with the previous line, probably due to bug in the
header-line continuation-folding algorithm.)
rather than the inscrutable:
ERROR: Argument Y is not a NUMBER: NIL.
Called from (KERNEL:TWO-ARG-+ 9 NIL)
(Who has the slightest idea what caused that error, except after
studying the backtrace, and even then?)

and a facility with abstraction and modularity.

I have that too. Will you forgive me for not *already* having 3+
years producing commercial shrink-wrapped products, and hire me to
work for your company? I can usually program each use-case in 4
hours (Common Lisp) or 8 hours (Java), except if it requires a new
major datastructure+algorithm, which typically takes several days.
For example it took me only one day to write the swing GUI for a
flashcard program which simply presents each card in sequence and
waits for you to click the OK or XXX button before showing the
next, and then another day to expand that program to include the
"in-hand" flashcard-drill algorithm which is a crude approximation
to the delay-doubling algorithm, then another day to replace the
"in-hand" algorithm with the true delay-doubling algorithm. I
haven't yet written the next part, which saves current state to
disk after each test, and restores from disk when you start the
program, and uses different saved-state files for different
students, because my pool of Chinese people who want to use my
program for learning English has dried up (not one person except
myself has yet tried the program since I installed the true
delay-doubling algorithm), so there's really no incentive to do any
more work on the program in the foreseeable future. The CGI version
of the full delay-doubling algorithm for fill-in-missing-word with
SEGMAT has been online since late 2002, log into my Web site as
guest1 with password free if you want to try a demo.
.