# Re: Suggest improvements

Gus Gassmann <horand.gassmann@xxxxxxxxxxxxxx> writes:

I have a large vector i of integers and I need to identify repeated
values and runs (of length at least 3) in it. For instance,
i = 1 2 3 5 8 8 12 6 5 4
has a run of 3 (with increment 1), a singleton 5, a duplicated 8, a
singleton 12, and a run of 3 (with increment -1). (The decomposition
might not be unique, but that is a secondary concern.)

My approach has been to write a function to identify the first pattern
only, and to call this function repeatedly starting right after the
last identified pattern.

Good plan though it may have an impact if your secondary concern about
there being multiple decompositions ever comes to the fore.

In the above example, I would call the
function successively with
i[0],i[3],i[4],i[6], and i[7].

I am concerned about performance and would be grateful for insights
and suggestions.

This is not an expensive function and, at first sight, a fast version
and a slow version will likely be within a few percent of each other.
What algorithm is this going to be a part of where such differences
might be significant?

inline void getMultIncr(int* i, int *mult, int *incr, int size)
{

Start with it not being inline, then measure and see if making it inline
helps. Many modern compilers will be better than you are at deciding if
there is any value in inlining a function call.

I don't like to waste the possibility of a return value. I'd return the
"run length" (*mult in your case). It will probably simplify the code
a little.

// i contains a pointer to the (remainder of the) vector
// *mult is a pointer to the number of elements in the current pattern
// *incr is a pointer to the increment in the current pattern
// size contains the number of elements in the (remainder of the)
vector

int mark;
int k;

*mult = 1;
*incr = 0;

if (size == 1) return;

I might try to generalise the code to cope with size == 0 as well. It
often makes calling a function simpler if there are fewer special cases
to test before the call. With a version that returns the run length,
all you need is

if (size <= 1) return size;

//try to find duplicated values
mark = i[0];
for (k=1; k < size; k++)
{
if (i[k] != mark) break;

A loop with an initial break is often better written by adding the
(negated) test into the loop's condition:

for (k = 1; k < size && i[k] == mark; k++) ...

(*mult)++;
}
if (*mult > 1 || size == 2) return;

The size == 2 part looks wrong to me. Given i = {1, 2}; I'd say that

getMultIncr(i, &length, &inc, 2);

should set length == 2 and inc == 1. Your code will return with length
== 1 and inc == 0. However, I can see that there is scope for ambiguity
here -- you might want to call this two singletons though that seems odd
to me.

// now look for possible runs
*incr = i[1] - i[0];
if (i[2] - i[1] != *incr) return;

*mult = 3;
for (k=3; k < size; k++)
{
if (i[k] - i[k-1] != *incr) break;
(*mult)++;
}
return;
}

I see that there is no way to return a length of 2 with an increment, so
maybe you have decided that 1, 2 is really two singletons.

You can almost certainly reduce the number of special cases here.
Duplicate runs can be detected in the same way that you detect
incrementing sequences.

Looks like a fun function. If I get time I'll write one and we can
compare styles.

--
Ben.
.

## Relevant Pages

• Re: Suggest improvements
... last identified pattern. ... A loop with an initial break is often better written by adding the ... I see that there is no way to return a length of 2 with an increment, ... maybe you have decided that 1, 2 is really two singletons. ...
(comp.lang.c)
• Re: Borg vs Singleton vs OddClass