Re: Tough interview question - any suggestions?
- From: Logan Shaw <lshaw-usenet@xxxxxxxxxxxxx>
- Date: Mon, 06 Mar 2006 00:06:29 GMT
Josh Sebastian wrote:
Googmeister wrote:Phlip wrote:Denis wrote:
I got a tough interview questions lately, and I would like to hear
your opinion:
An array of N chars is given
Write an efficient algorithm to find all the repeating substring with a
minimal size
of 2
Create an array of those, with p pointing into the string, and various sizes
of len. So C gets a { "C", 1 }, { "CF", 2 }, { "CFA", 3 }, and so on.
Lexically sort the array.
Not so good in terms of efficiency if you array is of length N^2.
Not necessary. You just need a single array of size (string length) -
(minimum substring length) + 1. You can evaluate the substrings in
"passes" which examine only substrings of a certain length. This
simplifies the string comparisons as well as reducing the memory usage.
I did almost exactly this for a real program I wrote a while back.
I found that not only is memory usage a problem, but it's also very
easy to get really bad behavior with the processor's cache. I was
using a hash to count all possible substrings of each length, and
I concluded that while on paper sorting might be worse (since the
hash can theoretically be O(N)), in reality sorting should be better
because with the right sort, you can get MUCH better locality of
reference. I even tested my code (that uses the hash) on a machine
with a much faster processor and only moderately faster memory than
the machine I had been using, and I found that the speedup was almost
exactly proportional to the improvement in memory access time, meaning
that the cache isn't helping.
For what it's worth, another way to save memory is to break it up
further into passes by only counting strings whose first character
is in a certain range. For example, one pass might do all strings
whose first character is in the range (0,127) and whose length is
10, and another pass might do remaining strings whose length is 10.
In my case, I was trying to find the one substring of a given length
that occurs most frequently, so that was a very big memory savings
for me.
- Logan
.
- References:
- Tough interview question - any suggestions?
- From: Denis
- Re: Tough interview question - any suggestions?
- From: Phlip
- Re: Tough interview question - any suggestions?
- From: Googmeister
- Re: Tough interview question - any suggestions?
- From: Josh Sebastian
- Tough interview question - any suggestions?
- Prev by Date: Re: Hash Function problem
- Next by Date: Re: Tough interview question - any suggestions?
- Previous by thread: Re: Tough interview question - any suggestions?
- Next by thread: Re: Tough interview question - any suggestions?
- Index(es):
Relevant Pages
|