Library Design, the script kiddie's nightmare.



There has been a general rule of thumb for many years when writing
assembler code, if you need something that does a job in a particular
way, roll your own and if you write it properly, it will do the job you
wanted it to do. This is a point that Frank made recently and it holds
good across the board of assembler programming.

With languages that have the capacity, libraries of reusable code make
a vast number of tasks easier and faster to write and various
assemblers are no exception here as most are library capable. Their use
comes at a cost that some of the older assembler programmers will not
accept in that it structures code in a particular way based on what is
commonly known as a "call tree".

The virtue of writing code in this modular format is that you can use
library procedures as components to build the "call trees" but this
places very specific demands on the design of the library to ensure it
does not trap the user of the "call tree" with a pile of redundant
garbage that slows down their code and bloats the code with junk that
is not used.

There are a number of approaches to doing this that are well understood
among assembler programmers, bash the procedure to death to get it
smaller, faster or by what ever other criterion is deemed to be useful,
if its a very short procedure, rewrite it without a stack frame to
reduce the call/stack overhead and if the basic library design allows
it, reduce the level of duplicate code in procedures by cross calling
other procedures that perform the required task.

This is all hack stuff to programmers that have been around for any
length of time who know how to write reliable procedures and library
modules but it does involve understanding enough about how a call tree
works and writing code that is both efficient and viable to use in more
complex constructions.

This is where the script kiddie's nightmare comes into play. In the
absence of actually bothering to learn how to write code that does what
it is supposed to do or alternatively bothering to read the
documentation for a library procedure and/or reading the source code
for each module if its available, there is this growing expectation
that someone else will do it all for them.

There is of course a market that caters exactly for people who want to
write code at this skill level, JAVA, VB, the MFC end of C++ and a
number of other very high level languages but they come at the price of
getting what you are paying for and this is the bloat, hand holding,
design restrictions and generally poor performance that is associated
with using this style of very high level language.

The problem with the script kiddies that want to write assembler or C
code in the same manner is that they want their cake while also being
able to eat it, sell it or put it aside for a rainy day but the world
does not work this way. To write low level languages you have to know a
lot more than how to slop around MFC, draw a button on a form in VB or
pop pretty pictures in JAVA.

Things like memory allocation, array member counts and buffer lengths
are things that low level programmers already know about and they use
all of these things to their normal performance advantages when they
are properly understood but its not exactly entry level programming and
it does mean understanding what the consequences are for making a mess
of any of these things.

Security issues are seen by many as important design considerations in
modern code and the very high level approach is to try and encapsulate
ever larger blocks of code with a variety of hand holding techniques to
try and make it idiot proof but it runs into two seperate problems,
there will always be a better idiot and people who go looking for
exploits generally know how to get around the pre-canned methods if
they can be exploited. The next problem of course in that code to do
this gets big, buggy and sloppy real fast and is often not worth using.

The script kiddie has to hope that the high level language designer
holding their hot little hand has fixed it for them but the low level
programmer can approach potential security exploits on a case by case
basis and can generally do a lot better because of it. A simple example
is putting an unprotected recursive quick sort capacity up on the
internet. Someone feeds it a million identical values and it goes
quadratic locking up the processor until it overflows the stack.

This is not a "hand holding" issue but a matter of selecting the right
algo that does not have this vulnerability. Its normally done with a
recursion depth indicator that collapses back to another sorting method
that handles non random data orderings.

Then you get the example of a commandline buffer overflow which has a
number of considerations in how you deal with it. A command line parser
is not a string length algo yet all the programmer writing the code
needs to do is check the system supplied command line length and if it
exceeds the size allocate for the command line buffer, reject it. Now
of course the far more important issue is with command line buffer
overflows is that if someone can get through the security of your
computer and get at your command interpreter without your
authorisation, you have far larger problems than a buffer overflow.

What the low level programmer does on a general level is design their
"call tree" by whatever method they like and enclose it with whatever
protection methods they deem appropriate to the context that the app
will be used in. taking responsibility for an application's
architecture and writing reliable code may well remain a nightmare for
script kiddies who are too lazy to bother to learn proper low level
programming but low level programmers do this all the time and pick up
all of the advantages of writing their own code without needing someone
to hold their hot little hand.

Regards,

hutch at movsd dot com

.