Re: need an algo for string sorting

From: Thad Smith (thadsmith_at_acm.org)
Date: 08/19/04


Date: Thu, 19 Aug 2004 10:20:30 -0600

Raghavendra Mahuli wrote:

> I have many strings. I have to sort them.
> ...
> But the complex part is "equivalent-values". it can be defined that "ai" and
> "ae" are equivalent-values. Then, while sorting two strings, i have to
> consider that "ai" and "ae" are equal......only if they come at same
> position in both the strings....
> For EX1:
> str1 - dfahaesdhfkj
> str2 - dfahaisds
> In str1, ae appears at position 4 and 5
> In str2, ai appears also at position 4 and 5....
>
> So they are equivalent.....And i can replace ae in str1 with ai......
>
> EX2:
> str3 - haehasdg
> str4 - hhsdaiskdn
> In str3, ae appears at position 1 and 2
> In str4, ai appears also at position 4 and 5....
> So here the rule "equivalent -values" does not apply......So i cant
> replace "ae" in str3 with "ai".......
>
> In short, the "equivalent-values" applies only when the
> "position-matches"...
>
> To add to that such equivalent values are many......
> So can you please suggest me some way of solving this problem ....

How do haem, haim, and hagm sort?

If equivalent values only apply when position matches, then

 haem < hagm since there is no position match of equiv
 haem = haim equiv position
 hagm < haim no position match

That does not yield a strict ordering of the input strings. You need a
different means of ordering based on equivalent values. You can't solve
the problem until you resolve this issue.

The problem sounds like undeclared homework, so here are some general
approaches (after you solve the above problem):

1. Use a comparison function that compares character by character,
checking each for possible initial character of a combined value.
2. Another way would be to transform the original strings into a sort
index string which takes into account the special combinations.

Thad



Relevant Pages

  • Re: Unicode Support
    ... > Not knowing much about UTF-8 (my Unicode knowledge extends as far as ... > literal strings of this form as long as the character code for quote ... > can never appear in a MBCS (multibyte character sequence). ... then XP Notepad directly understands UNICODE and you can ...
    (alt.lang.asm)
  • Re: Need help on string manipulation
    ... better to convert strings to UCS-32 before manipulation? ... Characters represented by wchar_t must use one wchar_t per character, ... which may use a multibyte encoding. ... use some newer Unicode characters, if this is a problem for you, then ...
    (comp.lang.c)
  • Re: Copying string to byte array
    ... of Strings and the CryptEncrypt + CryptDecrypt APIs. ... binary data should not be held in String variables. ... a) not all character codes are valid in a given ...
    (microsoft.public.vb.general.discussion)
  • Re: Zero terminated strings
    ... standard library that told you how to encode strings.) ... this redundant terminator as well). ... (Other string libraries like Vstr have ...
    (comp.lang.c)
  • Re: GetOpenFilename()
    ... Specifies that the File Name list box allows multiple selections. ... the directory and file name strings are NULL ... You advance the pointer one character more. ...
    (microsoft.public.vc.language)