Re: Library Design, f0dder's nightmare.



smile,

Problem
was fixed, anyway. Lousy code design? Hardly. Lousy code design is imposing
artificial restrictions on your users, and being prone to crashes even when
the user fullfills obides those restrictions.

For someone who claims to be the judge of other peoples code design,
you built your example with at least two errors in it, a buffer
limitation in your display while claiming that you were showing how to
avoid a buffer limitation and the claim of being a general purpose
replacement for an algo that handled tabs in 1998 when it was written.

Both fixes took less than a minute to implement.

The errors were in place while you were trying to be the judge of other
people code and fixing a couple of blatant blunders after the fact does
not do the job for you here.

The GetCL overflow was
reported more than a year ago - now, where's the GetCL fix?

What fix, the code has not and will not be modified as it does what it
was designed to do. When Jibz raised the issue a couple of years ago
due to OS change, I posted the code to control such problem without
interfering with the interface of the old algo which is the right way
to do it. Even the slightest change in the interface would break some
massive amount of code written using it while it is easy enough for a
user to add a length test if their app needs it.

The lesson you need to learn here is never ever break existing code by
changing an interface of a published algorithm.

If I were to support more delimiters than that, I'd probably use a
32-byte bit array and the BT* instructions... simple and elegant. A
bit wasteful unless you have a whole bunch of delimeters, though.

Bit manipulations is slow.

Depends on the architecture you run the code on.

When you are writing x86 assembler code, no it does not. Bit
manipulation in x86 is slow and can be done in far more efficient ways
if specifically bit sized data is not required. This is another
reversion to ancient architecture at the price of good design.

While my tokenizer is relatively flexible, it was written as a replacement
for GetCL, not high speed tokenizing. Speed is relatively irrelevant in that
context, correctness isn't. Not that you'd be able to tell the difference in
speed parsing a commandline anyway.

Yes but you can tell the difference of imposing an ancient architecture
on requirements where there is no need to load the arguments into an
array at all, many applications can simply do teir command line
comparisons on the fly with no array storage at all.

What's the guarantee the limit will stay 32k? After all, the limit did
increase from 9x to NT, remember? And again... why limit yourself to 32k
when there's no reason for that limit?

Its possibe that 256 bit Unix will have a terrabyte command line which
will be too restrictive for some users but each OS instance imposes
what it is designed to do and 32 bit Windows from NT based systems use
32k. You are confusing general purpose pasers with command line length
limits.

Why allocate
more memory than necessary? And I hope you're not suggesting to put
zeroes in the input buffer.

You may have this fetish about leaving the command line buffer
unmodified but the OS does not specify it as read only so there is no
reason not to directly modify it

Only if you scan it twice, once to get the argument count, the second
to load the data, it does not appwear by immaculate conception.

Compare that to the amount of runs through the commandline your GetCL would
take to process all arguments... ^_^

You are confusing a dedicated command line algo written in 1998 with a
general purpose parser. For the latter unrelated to processing command
lines, your code is awkward and slow and it is related to its ancient
architecture.

Yes, I had terrible problem finding you mistake in tab handling. It
took nearly 5 seconds.

Heh. Appearantly you didn't understand the code, though.

You missing tab handling did not need much understanding. :)

It seems you're applying double standards. At one time of day you're all
"omfg assembly has no limitations!11!", another time of day you're "omfg you
should follow standard coding conventions!!11!". Make up your mind.

You are confusing your own high level mindset with published register
conventions and assembler capacity. All three (3) volatile registers
can be used for return values.

the standard convention so the code can be used from any language that can
do C bindings, and I'm not losing any sleep over it since it's not a
bottleneck.

This translates to you have restricted the code to a high level
language mindset.

Reliably? Crashing at a too long commandline is hardly reliable. Again, you
don't have a documented limit on max commandline length, so even users
following your specification are prone to crashes.

Microsoft specify the results of an undersized buffer as "undefined".
Translate this to "unhandled exception" and you will understand that
component code should not be filled with junk that almost exclusively
does not apply. In each context the user of the module is better
informed than you are and buffer length control is better placed in
their hands than yours

Your underlying assumption is that every module should be filled with
security junk and this demonstrates that you have no concept of library
design where components are built into a call tree and the programmer
controls that call tree as they wish.

At best when you get it right, the approach you are trying to use can
produce one algo at a time in an encapsulated form that is not suitable
for building a call tree of various procedures because of the embedded
junk that may not even do what the programmer requires. Effectively
sloppy high level code design that assumes that the programmer needs to
have their hand held by someone who probably knows less about the code
than they do.

The point of this routine isn't to be the fastest or most efficient
in the world, by the way.

The point is your algo is both slow and awkward to use when so many
applications of parsers have no use for array storage at all. It comes
from trying to ape an ancient architecture instead of designing your
own.

Consider a DLL that needs to parse the commandline. Or a generic CrashDump
that dumps the commandline. Or tokenizing a read-only memory mapped file.

Imposing limitations of this type on what is supposed to be general
purpose code is another form of applying the lowest common denominator
to code design when it is known to produce crap.

True enough. And even in the 80386 days, wasting 32kb wasn't much of a
problem. It's surprising seeing an assembly "programmer" that advocates
bloat and slop, though.

In the DOS days, you may have worried about 32k but with 32 bit Windows
and later coming up, it is far more efficient to use temporary memory
that multiply scan a buffer when it is not needed.

But compare the tokenizer to your GetCL routine, and the amount of memory
passes that one does.

GetCL is not a general purpose tokeniser, it is a command line parser
written in 1998 for Win9x.

If you want to parse general purpose text or more dedicated
requirements like code, forget ancient argv - argc designs and write a
high speed left to right table driven parser. Paddling around in this
half understood ancient junk is a waste of time.

Regards,

hutch at movsd dot com

.



Relevant Pages

  • Re: More beginner help needed
    ... But the design decision lies in what to accomplish, ... >> not in how to pronounce the command once you have decided. ... >> They have the same semantic choices available to them as I do. ... don't hallucinate. ...
    (comp.os.linux.misc)
  • Re: Ping Kevin Aylward - re your "scientific paper"
    ... > Religious and political views contrary to your own, a poor command of ... I do not have a "poor" command of the English language in the slightest. ... Waveform Display, FFT's and Filter Design. ...
    (sci.electronics.design)
  • Re: How to handle concurrency issue with better performance?
    ... Collisions occur because your design permits ... We all work with relational database designs and more ... Hitchhiker's Guide to Visual Studio and SQL Server ... select command). ...
    (microsoft.public.dotnet.framework.adonet)
  • Re: Decent 49G/50G Guide for new HP User?
    ... to the most useless key I could find: ... designed into the HP48 series ... a variable named 'CST' or by using the MENU command, ... which does the storing for you); with careful menu design, ...
    (comp.sys.hp48)
  • Re: Memory Allocation in a Multi-Threaded Environment
    ... In general, I believe if you are sharing the same data between two threads, the design is ... A worker thread is created to read the entire file into a buffer whilst the ... catching up with the download. ... If you have to wait to resize until the processing is done, ...
    (microsoft.public.vc.mfc)

Loading