Re: Tough interview question - any suggestions?



Josh Sebastian wrote:
Googmeister wrote:
Phlip wrote:
Denis wrote:

I got a tough interview questions lately, and I would like to hear
your opinion:

An array of N chars is given
Write an efficient algorithm to find all the repeating substring with a
minimal size
of 2

Create an array of those, with p pointing into the string, and various sizes
of len. So C gets a { "C", 1 }, { "CF", 2 }, { "CFA", 3 }, and so on.

Lexically sort the array.

Not so good in terms of efficiency if you array is of length N^2.

Not necessary. You just need a single array of size (string length) -
(minimum substring length) + 1. You can evaluate the substrings in
"passes" which examine only substrings of a certain length. This
simplifies the string comparisons as well as reducing the memory usage.

I did almost exactly this for a real program I wrote a while back.
I found that not only is memory usage a problem, but it's also very
easy to get really bad behavior with the processor's cache. I was
using a hash to count all possible substrings of each length, and
I concluded that while on paper sorting might be worse (since the
hash can theoretically be O(N)), in reality sorting should be better
because with the right sort, you can get MUCH better locality of
reference. I even tested my code (that uses the hash) on a machine
with a much faster processor and only moderately faster memory than
the machine I had been using, and I found that the speedup was almost
exactly proportional to the improvement in memory access time, meaning
that the cache isn't helping.

For what it's worth, another way to save memory is to break it up
further into passes by only counting strings whose first character
is in a certain range. For example, one pass might do all strings
whose first character is in the range (0,127) and whose length is
10, and another pass might do remaining strings whose length is 10.
In my case, I was trying to find the one substring of a given length
that occurs most frequently, so that was a very big memory savings
for me.

- Logan
.



Relevant Pages

  • Re: Check for Common character sequence ( I will pay)?
    ... Write a function in C# that takes in an array of ASCII strings and finds ... common character sequences. ... more adjacent characters that appear in more than one string in the array. ... //count of times this substring ...
    (microsoft.public.dotnet.framework)
  • Re: String: how works?
    ... the char array is only copied if the array is ... because substring() reuses char array for speed. ... So the final total of memory chunks in the O.P.'s situation is four: the String objects for the literal, for the new String constructed from the literal, and for the substring, and a single charreferred to by all three. ...
    (comp.lang.java.programmer)
  • [Corrected and Updated] Re: General method for dynamically allocating memory for a string
    ... Richard - bringing up the issue of freeing or deleting memory twice. ... freeing or deleting given a NULL pointer more than once. ... this function will return a newly allocated string. ... // pointer to the substring on the heap ...
    (comp.lang.c)
  • Re: General method for dynamically allocating memory for a string
    ... int main{ ... is your responsibility to delete this memory to prevent a leak. ... // pointer to the substring on the heap ... char const *CreateSubstring(char const *const p, ...
    (comp.lang.c)
  • Re: General method for dynamically allocating memory for a string
    ... // Setting a null pointer to zero ensures you ... is your responsibility to delete this memory to prevent a leak. ... // pointer to the substring on the heap ...
    (comp.lang.c)