Re: "Reversed" LCS problem

From: Arthur J. O'Dwyer (ajo_at_nospam.andrew.cmu.edu)
Date: 01/13/04


Date: Mon, 12 Jan 2004 19:10:57 -0500 (EST)


On Tue, 13 Jan 2004, Wieslaw Lubas wrote:
>
> Uzytkownik "Thad Smith" <thadsmith@acm.org> napisal w wiadomosci[...]

  By the way, I think there exists a newsgroup pl.comp.programming,
or something like that, for Polish speakers. Have you tried posting
to that group? (I'm assuming you're Polish, given the name, the
email address, and the 'napisal w...' line. :) Then you might not
have to worry so much about defining "substring" versus "subsequence,"
et cetera.
  Also: This response is really long, and contains several different
points. Don't feel you have to answer them *all* in one response.
I also finish up by making a guess at what you might really be wanting;
if it's correct, I would recommend starting a new thread with a better
title, just in case anyone knowledgeable has stopped reading this one.

> > > Maybe very simple example:
> > >
> > > string1 abbcccddddf
> > > string2 abbdddeeee
> > > ...
> > > stringn abbbeeeeffffff

  Don't use "..." unless you really mean "and so on, up to..." It
looked like you meant that we should infer, or figure out, the
values of string3, string4, ..., string[n-1] from what you'd written
so far -- which is impossible!

> > > answer for string 1 and 2 are:
> > > if k=0
> > > ccc
> > > if k=1
> > > bccc or cccd
<snip>
> > > Answer is:
> > > - continous
> >
> > I don't know what you mean here by "continous" (or continuous).
>
> continous means substring NOT subsequence

  "Substring" and "subsequence" are such similar words that it's
confusing to anyone not versed in the notation, and unintuitive even
if one *can* understand it. You could say "contiguous," or, better,
say "the letters in the string must be next to each other, not
separated by other letters." Better to look condescending than to
look reticent, IMHO.

> > > - for k=0 have no character common with other string
> >
> > I think I understand that.
> Good

> > > - for k=1 have no more than 1 character common with other string
> > > - for k=2 have no more than 2 character common with other string
> > > ...
> >
> > That makes sense. Again, you seem to be looking for a maximum length
> > substring of string 1 that has
[at most]
> > k characters which exist in string 2.
> > You are NOT looking at "substrings of other strings".
> YES.

> > After showing examples that seem to make sense, you further say
> >
> > > In advance, i need to search about 15 chars length substring of
> > > string 1, that is not substring of other strings, and having as
> > > low as possible common "substrings of this substring" with other
> > > n-1 strings.
<snip>
> > Here you seem to be searching for a substring of a GIVEN length which
> > has the lowest number of common characters from the other strings, not
> > maximum length for a given number of common characters, which is a
> > related, but different objective. Also, if you really mean substrings
> > of the other strings and not just number of characters matching,
> > regardless of location in the string, you need to explain that
> > carefully, because I don't see your examples supporting it.
>
> ABOUT doesn't mean GIVEN, right?

  "Given," in a formal [mathematical] context, means "specific" or
"known." That is, you have been _given_ the number 15 from somewhere,
and you're trying to find a substring of that "given" length.
  Don't say "about 15." That's completely meaningless in this context;
how do you write a program to tell whether a number is "about 15"?!

> I want find all , mutually exclusive, maximized by lenght, separated
> substrings in string "1" compared to string 2.

  You want to find a set of substrings of string 1, such that:

    1) no two substrings in the set overlap each other
    2) no substring in the set has more than k of its characters found in
         string 2
    3) the longest possible substring subject to condition (2) is part
         of the set

  That is, given string1="abcdc" and string2="bc" and k=2, you would
rather see the set {"abcd", "c"} than the set {"abc", "dc"}. Correct?

> Generally, i search for algorithm
>
> given N strings (about 500-1000 chars)
> for EVERY string find substring unique or individual in set of N strings.
> So, if got this substring, I can say: this is exclusively from THIS ONE
> string, not other one, in any case.

  Now *this* sounds like a useful programming project! But it sounds
vastly different from what you were trying to say earlier. :) This
sounds like you are trying to develop an algorithm that will take the
strings

   classic vanilla
   chocolate torte
   chocolate chip
   pistachio

and produce a set of output strings something like:

   v
   to
   hip
   pi

that could, for instance, be plugged into the regular expression

   *{string}*

to match the requisite input string; e.g., the regex *hip* matches
the input string "chocolate chip" but no other; the regex *to* matches
the input string "chocolate torte" but no other; et cetera.
  This could have some really useful applications!

> but i don't want ANY exclusive substring, but maximal exclusive!
> No whitespaces, no sequences, only substrings.

  Two problems: Define "maximal exclusive." Use complete sentences
and as much detail as you can. Secondly, define "whitespaces."
How does the definition of "whitespace" fit into your previous
problem statement, which didn't involve any special kinds of characters?
Can you make the "whitespace" issue into a "special case" of your
N-string problem?

> Please, try to get it and maybe propose sth.
> Maybe exist any good solution. Just i don't know.

  Oh, and don't get lazy. "sth" is not a word. "Something" is a
word, so you should use it instead. If you start cutting corners
now that you're just making a bit of progress, you won't get
anywhere. Math is like that.

-Arthur



Relevant Pages

  • Re: How does this work?
    ... Let's say the string value is ... The value of the first argument is the substring that matched. ... the match extends as long as the delimiter characters don't match. ... So, as you can probably see now, it is actually a simple template engine ...
    (comp.lang.javascript)
  • Re: Deleting substrings
    ... substring is present in the string or not. ... We start at the beginning of the string. ... We copy the characters from the point immediately after ... redundant scans of the host string. ...
    (comp.programming)
  • Re: split a string
    ... >I have a string of N characters. ... >more ‘*' characters, which are the delimiter for the substring. ...
    (alt.comp.lang.learn.c-cpp)
  • Re: [QUIZ] Longest Repeated Substring (#153)
    ... My hack at the substring problem is based on suffix ... in the original string. ... def initialize ...
    (comp.lang.ruby)
  • Favicon type detection - possible?
    ... this is my cobbled together browser detection ... {string: navigator.userAgent, ... subString: "OmniWeb", ... {// for newer Netscapes ...
    (comp.lang.javascript)