Re: Comments suck [Was: Length of variable names affect performance?]



In article <Pine.LNX.4.60-041.0504220035500.9843@xxxxxxxxxxxxxxxxxxxxx>,
Arthur J. O'Dwyer <ajo@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
>On Tue, 12 Apr 2005, Dave Vandervies wrote:
>>
>> Phlip <phlip_cpp@xxxxxxxxx> wrote:
>>> Post some code that needs a comment.
>>
>> Here's some code, with the comments removed and most of the identifying
>> marks filed off, taken from a module that does fuzzy matching ("is there
>> anything close to this?") in 2-dimensional space.
>>
>> What's it doing, how is it doing it, and why was it not done differently?
>> How would you add that information to the code without comments?
>
> I think I see what it's doing: It's adding one point to a binary search
>tree of some kind, whose entries are ordered by increasing X or Y
>coordinate, depending on which coordinate has the higher variance (or some
>quantity similar to variance).

Not correct, but headed in the right direction.

This is a quadtree-based lookup structure that, because of the lack of
a way to dynamically balance the tree, keeps multiple data points in
each leaf node to reduce the depth-increasing effects of poorly chosen
insertion orders. (Internal nodes contain only metadata (pointers to
leaf nodes and positions of the dividing point); all the actual data
lives in leaf nodes.)

transmogrify gets called by internal_leaf_insert when the leaf is full,
and splits it into two (ideally, both exactly half-full) leaf nodes
with the struct containing data for the old leaf node now containing
the internal node containing the two new leaves.
(The direction of the split is determined by which axis the leaf's data
has a higher variance along.)


> Maybe. I'm not clear on the uses of 'upper', 'lower', and 'points'.
>We're at a disadvantage by not being able to see the whole program, and
>see how the structures are defined and used.

Yep. Part of my point was that comments (when used properly) are a better
way to make code locally readable than trying to eliminate comments and
use globally meaningful names as a replacement.

points is the set of data points stored by a leaf node. upper and lower
are the subtrees of an internal node (upper contains values above the
boundary value, lower contains values below).


> (xsum / MAX_LEAF_POINTS) is the mean of the X coordinates, which is a
>logical place to put the root of the tree. 'MAX_LEAF_POINTS' is an awful
>name for "the number of nodes in the tree," which I think is what it
>represents. I can't figure out how it gets its value, or where it's
>defined. My original hunch was that it represented the maximum size of
>a collection of points, but that doesn't jibe with how you're using it.

MAX_LEAF_POINTS is the size of the array of points in a leaf node.
When this fills up, it's time to split it up into two leaf nodes. So it's
the maximum size of the subset of a collection that a leaf node can hold,
not the maximum size of an entire collection.


> All the setting of "member-function" fields really ought to be relegated
>to a function, say 'nonleaf_init_fields(n)'.

Not unreasonable, but here (for internal nodes) and in the create_new_node
function (for leaf nodes) are the only places it happens, so putting it
inline (with a comment saying that that's what it's doing, for high-level
readability) is consistent with good practice guided by a different set
of tastes and styles.


> As to /why/ it's being done this way, I have no idea. Particularly, I'm
>baffled by the last few lines of the routine, which seem to indicate that
>the algorithm is "Tear down the old tree and insert all the points into
>a new tree. Then insert the new point, and return." What's wrong with
>making this the one-liner 'n->internal_insert(n, p)'? Unless
>'internal_insert' doesn't do what I think it says it does...!

This part is actually an ugly wart that I'm not very happy with, mostly
because it had to be added after most of the code was there and I didn't
have time to do it properly.
The ids[] array is dynamically allocated (because it needs to be
arbitrarily resizeable) and contains a set of ID values associated with
the point; since the code was quite stable by the time that was put in and
there weren't many places that actually had to allocate and deallocate
points, it Was Decided (rightly, with the non-code constraints that had
to be met) that it would be easier to leave the wart in than to handle
this properly.

internal_insert inserts a single point, so aside from that wart all it's
doing is inserting all of the values from the old leaf node and then
the new value that the old leaf node didn't have room for, and letting
the internal-node insert function handle deciding which of the new leaf
nodes they go in.


> Anyway, here's my whack at it.

Not bad. Now (to bring us back onto my original point in posting it
without comments) how much of what I've said above (almost all of which
is documented in the comments) can you (or the peanut gallery) think of
a good way to document in the code without comments?


dave

--
Dave Vandervies dj3vande@xxxxxxxxxxxxxxxxxxx
What you have in mind about trees is extremely obscure. They come in green,
red-black, avl, binary, B, deciduous, non-deciduous, tropical, fruit,
ornamental, just to name a few varieties. --CBFalconer in comp.lang.c
.



Relevant Pages

  • Re: Approaching the infinite binary tree
    ... An infinite complete binary tree is a tree with Aleph_0 levels, ... path ends in a leaf node, but in the complete infinite binary  tree none ... just as endless strings of charactes do not ...
    (sci.math)
  • Re: Cantor and the binary tree
    ... >>> This no more proves countability of the set of paths than the fact that ... >>non-empty subsets) of paths in the tree. ... path ends in a leaf node, which are half the nodes in the tree. ... one node that represents the root path. ...
    (sci.math)
  • Re: Cantor and the binary tree
    ... >> Each leaf node corresponds to a terminating ... >> non-terminating binary proper fraction. ... When one has a finite binary tree every maximal path ends in a leaf node. ... That makes no more sense than to argue that there must be gaps between ...
    (sci.math)
  • Re: Cantor and the binary tree
    ... The set of all such lists, like the set of all those maximal paths, is uncountable. ... path ends in a leaf node, which are half the nodes in the tree. ... You start with one node that represents the root path. ... If every path shares the root node, eachfinite path will also have a leaf node, so there is at least one extra node. ...
    (sci.math)