Re: Library Design, f0dder's nightmare.
- From: "f0dder" <f0dder_nospam@xxxxxxxxxxxxxxxx>
- Date: Mon, 19 Jun 2006 16:11:18 +0200
hutch-- wrote:
I am pleased to see that you have fixed another mistake after the
first demo mistake but I suggest to you that a sequence of blunders
on this scale while using your example as a model of how other people
should design their code hasn't sold the case you wish to flog over
time.
Mistake? Which mistake?
Not having TAB support in the first version is hardly a mistake; I've never
seen people use TAB as delimiters for a commandline, but okay - the VC++
argc/argv tokenizer supports it, so why not.
If I were to support more delimiters than that, I'd probably use a 32-byte
bit array and the BT* instructions... simple and elegant. A bit wasteful
unless you have a whole bunch of delimeters, though.
Now rather than trying to use flawed examples to criticise other
people's work, if you bothered to properly develop the algo you could
get something like reasonable results for a very old architecture like
argv - argc that would be popular with people who ant to use an old
architecture like this.
Old architecture? *shrug*. It's easy and convenient. Besides, this tokenizer
can be used for more than just argv tokenizing.
Using the stack may be easy in this context but its not good design as
the application that uses the algo needs to compensate for the unknown
stack offset. You can do a lot better with just a little simple
arithmetic.
Huh? The stack is only used temporarily while creating the argv array.
A 32k command line can at the most have 16k of arguments, (1 character
and one delimiter so you dynamically allocate a 64k array to store
pointers. Feed it through your parser to get the offset of each
argument and overwrite the trailing delimiter with a zero.
Why limit yourself to 32k when there's no reason to do that? Why allocate
more memory than necessary? And I hope you're not suggesting to put zeroes
in the input buffer.
Write each offset to its appropriate place in the array of pointers and
when you
have finished the scan of the buffer, either original or a copy, you
reallocate the buffer size back down to the last member written to it
so you are not wasting memory.
Why reallocate when you can allocate proper size from the start? :) I guess
you haven't even look at my code. Or perhaps you just don't understand it,
which is also pretty likely.
You need to pass back 2 pieces of information, the argument count and
the pointer array offset. In assembler you would use two (2) registers
but if you need to use it from a high level language, you would
probably pass the address of a structure and write the two values to
it.
In assembly, for a general-purpose and reusable component, I'd still use
pointers, since the Intel established calling convention only returns values
in EAX. This is not a speed-sensitive routine where register passing would
make any sense.
You can do it even smarter by testing the command line length, and
dividing the byte count by two to determine the maximum pointer array
size array so it uses even less memory.
I don't need to do that, since I allocate only the exact necessary amount of
memory.
This is not a particularly hard algo to code and if you bothered to
put it together in a reliable manner, it would be fast, safe and
convenient to use without offsetting the stack by an unpredictable
amount.
It's easy, yes... I wonder why you messed up so badly in your GetCL, then :)
The point of this routine isn't to be the fastest or most efficient in the
world, by the way. The idea was to demonstrate a way of handling the problem
that is dynamic without unnecessary reallocations, and furthermore to
demonstrate an interesting implementation that lends itself well to
assembly, but would be hard to implement in traditional HLLs like C/C++ and
Pascal.
The benefits of the routine is that it leaves the input buffer alone, only
does the necessary amount of memory allocations (avoids heap fragmentation),
and can handle pretty much arbitrary length input (no limit on total
inputbuffer length or individual token length).
The downside is that it touched input memory twice; once while tokenizing,
and once while copying to putput buffer. Also, with a standard win32 stack
size, it's limited to around 130.000 tokens. This is hardly a problem when
parsing commandlines though; that's 8 times more than the max possible with
a NT commandline, assuming 32k is the limit.
.
- Follow-Ups:
- Re: Library Design, f0dder's nightmare.
- From: hutch--
- Re: Library Design, f0dder's nightmare.
- From: Eman
- Re: Library Design, f0dder's nightmare.
- From: japheth
- Re: Library Design, f0dder's nightmare.
- References:
- Library Design, f0dder's nightmare.
- From: hutch--
- Re: Library Design, f0dder's nightmare.
- From: japheth
- Re: Library Design, f0dder's nightmare.
- From: f0dder
- Re: Library Design, f0dder's nightmare.
- From: hutch--
- Library Design, f0dder's nightmare.
- Prev by Date: Re: Library Design, f0dder's nightmare.
- Next by Date: Re: Library Design, f0dder's nightmare.
- Previous by thread: Re: Library Design, f0dder's nightmare.
- Next by thread: Re: Library Design, f0dder's nightmare.
- Index(es):
Relevant Pages
|