Re: Programmer's unpaid overtime.
From: Calum (calum.bulk_at_ntlworld.com)
Date: 12/07/03
- Next message: Thomas Stegen: "Re: Language to Learn"
- Previous message: Gerry Quinn: "Re: Language to Learn"
- In reply to: Edward G. Nilges: "Re: Programmer's unpaid overtime."
- Next in thread: Programmer Dude: "Re: Programmer's unpaid overtime."
- Reply: Programmer Dude: "Re: Programmer's unpaid overtime."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 07 Dec 2003 18:43:02 +0000
Edward G. Nilges wrote:
> Calum <calum.bulk@ntlworld.com> wrote in message
> news:<bqq4lm$onq$1@news7.svr.pol.co.uk>...
>
>> Edward G. Nilges wrote:
>>
>>> Programmer Dude <Chris@Sonnack.com> wrote in message
>>> news:<3FCFABA6.FA73EF3E@Sonnack.com>...
>>>
>>> What we need to see are not these single benchmarks but instead
>>> the execution time "curves" for the CJ and the EGN versions.
>>> There is a possibility that the version I wrote quickly and
>>> posted had a flaw, although my production version fails to
>>> exhibit NP complete behavior for large files.
>>
>> It seems only fair that you post an equation for the execution time
>> of your program to show us all how it's done.
>
>
> If you cache the parse of individual words the time to find a random
> word is D*S+(N-1)*K where D is the number of DISTINCT words, S is
> the initial search time for any word using brute force, N is an
> average number of times a typical word occurs and K is a constant
> that represents the roughly constant time it takes, to look up a
> keyed item in a hash table such as a Visual Basic collection.
I have not been following this thread very carefully. Is the task to
search a document for a word? You do it by indexing the document, he
does it by a simple search?
If you index a document, that would take k1.N time (N is the size of the
input, and k1 is a constant dependent upon your hardware and
implementation), assuming you are using a hash table. Then to look
something up in the index would take a constant time c. If you perform
S searches, the total cost T1 = k1.N + c.S.
Without an index, the search time is T2 = k2.N.S. So if the number of
searches S is small, you would expect T1>T2, since it is a lot of effort
to build an index, so k1>k2. But as soon as S grows, T1<T2.
But this assumes you don't need to rebuild the hash table before each
search of course, and your index is implemented sensibly.
> If you use a brute force search for each word and each of its
> occurences the formula becomes S*T where T is the total number of
> words, both distinct and the same.
>
> The paradox is that the latter formula S*T is "simpler" than the
> other formula but represents more of what the computer science
> professor means by complexity.
I don't think anyone thinks that S*T is a smaller number just because
the formula has fewer symbols!!!
> The cache solution's execution time formula is a function of the
> total number of distinct words (necessarily smaller than the total
> number of words) and the average times a random word repeats: the
> brute force solution's execution time formula is a function of the
> word count.
Yea, but storing each word is going to slow things down. To build your
index in the first place will also be proportional to the word count.
You don't know a word is a duplicate until you have read it.
> The operator of lowest precedence, the "major" operator, of the cache
> time formula D*S+(N-1)*K is addition. The operator of lowest
> precedence of the brute force formula S*T is multiplication. Unless
> something's hidden in the addition-based formula, this means that
> total execution time grows "more" slowly for the cache solution, at
> the expense of the use of more storage.
>
> Chris Sonnack's numbers show what may be but probably isn't a problem
> in this reasoning, but my main concern in the real world is not some
> absolute efficiency of a specific program, but the efficiency
> "trace", the efficiency "footprint" of a family of related solutions,
> which is always the real entity in the system life cycle as the
> user's needs change.
If your executions times are of the form k*N (anything more is
excessive), then your execution "curves" as you put it are both straight
lines. And given that his line is under your line, it means that his
will be faster _for all N_. Why build an index if you don't need it?
> I have an essentially verbal, and some may say verbose, understanding
> of the problem. I am certain that Richard Heathfield et al. will
> find it distasteful.
>
> This is because without knowing it, they are engaged in a postmodern
> attack on language itself, far more serious than the attack
> neoconservatives think is being made.
They are probably objecting to excessive verbiage (aka waffle). Why use
a hundred words when ten will do?
The problem is that natural language is imprecise. A technical
explanation will use a different style of English to a novel. But this
is more appropriate since it communicates more effectively.
See my later comment about jargon.
> In this attack, comp.programming is not about programmers (har!) or
> if it is they are to be subject to a new law of omerta, and must be a
> bruder schweigen.
A minority of programmers have big egos and superiority complexes.
Everyone's been the victim of the "you said something I disagree with,
therefore you are a moron" fallacy.
> The game becomes cutting pure code with an incommunicable
> understanding of the problem which in my book isn't an understanding
> at all.
>
> The campaign is part of what I consider the deprivation of language
> by politics in which narration and naming become a management
> prerogative over which clowns fight in this ng.
I was rather enjoying people comparing beard lengths!
> As in the cases where Richard Heathfield has informed me that I can
> no longer observe that (for example) that a community "has an
> understanding", the game becomes to strive for a language deprived of
> the ability to talk but at the dullest and most literal level.
> Paradoxically and because we still must deal with abstractions, the
> dull muttering that results is riven with self-contradiction and is
> in fact a language of DENIAL.
You object to jargon? This is one of the main objections "artists"
have of "scientists". They claim that scientists deliberately obfuscate
their work in order to perputuate their "superiority". In fact,
scientists cannot operate without jargon - you cannot explain everything
from first principles or you'd be there all day. Artists feel left out
simply because they do not have the background to fully appreciate an
exchange between scientists, and are doubly frustrated since scientists
can usually follow the gist of what artists are talking about. Artists
therefore assume they are the better communicators, while scientists
think that artists are just performing mental masturbation.
The same is true of computer science. Jargon is necessary, there is no
deliberate clique, I think you are doing the various posters I have come
to enjoy reading a disrespect.
Jargon facilitates communication.
> The result is a curious language in which certain keywords are
> uttered, not to push meaning forward but instead to give signals to
> Authority that one is O.K.
A uniform language makes life easier. If you choose to redefine
established phrases or invent new ones, then communication will break down.
<much stuff snipped>
>
>>> The fact remains that you seem to confuse genuine analysis of
>>> software complexity with a few benchmarks.
>>
>> Complexity? Tell me more! :-)
>
>
> There is as I hope you know a whole bunch of useful acamodemic work
> on software complexity which is the mathematical manipulation and
> analysis of the execution time formulae of algorithms and actual
> programs.
Indeed I do, I was merely wondering what your unique insights were.
> Chris Sonnack presented some interesting numbers. But I have no way
> of auditing those numbers. He may have made some stupid error.
But surely you could ask him for the source code instead of attacking him?
> In the example, I was trying in fact to show how bad C sucks. This is
> because it completely obscures the difference between code with
> state and code without state.
C often requires more verbiage and is more prone to bugs than higher
level languages. There are many areas where C would be the language of
choice, although its domain is shrinking. A lot of higher level
languages "suck" for many other reasons.
> If you create an array or cache in the C parser, you have to finger
> out where to put it. If you put it in "open code" outside of any
> function, where this area contained prior only #defines and
> structures that did not allocate structure instances, you have a New
> Thing which needs RAM and which within this RAM develops a state.
If your intention was to ridicule C (and presumably, people who believe
C is the apogee of programming language design), you picked a bad
example. C would be frightfully fast at text searching since all it
needs to do is mmap() the file into memory and pass a pointer over it.
C leaves the memory management to the programmer. Ideally a programming
language would unburden the programmer as much as possible, but
sometimes a finer control is needed. C programmers enjoy routinely
using malloc and free because, um er, it is faster or something, and a
good mental exercise remembering to manually free resources.
I use C++ because it has both low-level features, and some quite high
level ones. Therefore you can pick a style that suits the problem.
> This Thing becomes responsible for managing different strings from
> different callers. Over and above the individual cache for a string
> it is responsible for storing and looking up different input strings.
>
>
>
> Whereas there's a one-to-one relationship, tightly coupled, between a
> PowerString instance and its string.
But this is also C's strength. For example, your tokenizer does not
need to allocate a new string each time it matches a token. Instead it
can return pointers into another string. This is more efficient, at the
expense of a little more complexity.
C is a low level language. There are times when low level control is
needed, and times when it is not. Instead of saying "C sucks", say "C
is not for you." A higher level language would suit you, and the
problems you like to solve, better.
> Idiots, who talk about complexity metrics and mean lines of code or
> If nesting, need to understand mathematics including its limitations.
> Because a piece of code is complex or simple only in relation to a
> reader, you CANNOT measure software complexity, except as social
> research of the crudest sort which affects the phenomenon it would
> describe.
>
> "My code is simple: you code is [way too] complex" is narcissism pure
> and simple. And, while it doesn't exactly break my heart, companies
> have lost big bucks because schools do not teach programmers that
> 90% of their real job will be (1) understanding and then (2) changing
> somebunny else's code. The result is that the newbie (1) rewrites
> the code.
Striving for simplicity is not narcism. An overly complicated solution
is probably slower and contains more bugs, and takes longer to write and
maintain. Programmers should strive for simplicity.
- Next message: Thomas Stegen: "Re: Language to Learn"
- Previous message: Gerry Quinn: "Re: Language to Learn"
- In reply to: Edward G. Nilges: "Re: Programmer's unpaid overtime."
- Next in thread: Programmer Dude: "Re: Programmer's unpaid overtime."
- Reply: Programmer Dude: "Re: Programmer's unpaid overtime."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|