Re: How to put brackets in a string given substrings



Wijaya Edward wrote:
>
> I have the following problem.
> I am trying to put the bracket in a string given the set of its substrings.
> Those bracketed region is "bounded" by the given substrings.
> Like this, given input "String" and it's "substrings"
>
> String
> 1.CCCATCTGTCCTTATTTGCTG
> 2.ACCCATCTGTCCTTGGCCAT
> 3.CCACCAGCACCTGTC
> 4.CCCAACACCTGCTGCCT
> 5.CTGGGTATGGGT
> 6.AGGAACTTGCCTGTACCACAGGAAG
>
> Substrings:
> 1. ATCTG ATTTG
> 2. CCATC
> 3. CCACC CCAGC GCAAC
> 4. CCAAC ACACC
> 5. GTATG TGGGT
> 6. CAGGA AGGAA
>
> The desired answer are:
> 1. CCC[ATCTG]TCCTT[ATTTG]CTG
> 2. AC[CCATC]TGTCCTTGGCCAT
> 3. [CCACCAGCACC]TGTC *
> 4. C[CCAACACC]TGCTGCCT *
> 5. CTGG[GTATGGGT] **
> 6. AGGAACTTGCCTGTACCA[CAGGAA]G **
>
> Please note that in example 3 and 4 the substrings are "overlapping".
> Pay attention also to for example 5 and 6, there exist substrings that occur
> twice. So the answer for example 5 and 6 are NOT
>
> 5. C[TGGGTATGGGT] ----this is wrong
> 6. [AGGAA]CTTGCCTGTACCA[CAGGAA]G ----this is wrong
>
> Since they do not follow the order from the given substrings (array -- see my code).
> Below is my code. It only work for example 1 and 2.
> How can I approach this problem so that it can handle all those cases?
>
>
> __BEGIN__
> #!/usr/bin/perl -w
> use strict;
>
> my $s1 ='CCCATCTGTCCTTATTTGCTG'; my @a1 = qw(ATCTG ATTTG);
> my $s2 ='ACCCATCTGTCCTTGGCCAT'; my @a2 = qw(CCATC);
> my $s3 ='CCACCAGCACCTGTC'; my @a3 = qw(CCACC CCAGC GCACC);
> my $s4 ='CCCAACACCTGCTGCCT'; my @a4 = qw(CCAAC ACACC);
> my $s5 ='CTGGGTATGGGT'; my @a5 = qw(GTATG TGGGT);
> my $s6 = 'AGGAACTTGCCTGTACCACAGGAAG'; my @a6 = qw( CAGGA AGGAA );
>
> #These two work fine.
> put_bracket($s1,\@a1);
> put_bracket($s2,\@a2);
>
> #But these the rest don't work
> put_bracket($s3,\@a3);
> put_bracket($s4,\@a4);
> put_bracket($s5,\@a3);
> put_bracket($s6,\@a4);
>
> sub put_bracket
> {
> my ($str,$ar) = @_;
> my $bstr;
> my $slen = length $ar->[0];
>
> foreach my $subs ( @$ar )
> {
> my $idx = index($str,$subs);
> my $bgn = $idx;
> my $end = $idx+$slen+1;
> substr($str,$bgn,0,"[");
> substr($str,$end,0,"]");
> }
> print "$str\n";
> return ;
>
>
> __END__
>
> Really hope to hear from you again.

This appears to do what you want:

sub put_bracket {
my ( $str, $ar ) = @_;

my $x = 0;
for my $subs ( @$ar ) {
if ( substr( $str, $x ) =~ /$subs/i ) {
$x += $-[ 0 ];
substr( $str, $x, length $subs ) =~ tr/A-Z/a-z/;
}
}
$str =~ s/([a-z]+)/[\U$1\E]/g;

print "$str\n";
return;
}



John
--
use Perl;
program
fulfillment
.



Relevant Pages

  • Re: Less is not More
    ... not produce an increase in the string's KC toward its maximum upper ... You are implying that larger substrings have constant or ... digits of computable constant X" with n as an argument, ... 24 ways I can concatenate them once each into a single string, ...
    (talk.origins)
  • Re: (?{..}) and lexical scoping issues.
    ... > @_ is most definitely a package variable. ... > (You are counting how many substrings of each string are also substrings ... > all substrings of a string. ... it will be hard to beat with a Perl program. ...
    (comp.lang.perl.misc)
  • Re: Question of reference and (sub)strings.
    ... >I might be wrong - but I'm pretty sure that substrings in ruby are ... >same block of memory as the original string - the allocation of a new ... write but that sub /strings/ were not. ...
    (comp.lang.ruby)
  • Re: Einlesen einer GROSSEN Textdatei
    ... In Wirklichkeit gibt es knapp 50 Substrings, ... muss ich den String vorher leeren. ... zeilenBean = new MeineZeilenBean; ... anzahlVorkommen = errorCount1; ...
    (de.comp.lang.java)
  • Re: Parse String
    ... it picking up substrings beginning with 'f' followed by hyphens, ... Function gnaf(s As String) As Variant ... Exit Do 'inner Do ... Loop ...
    (microsoft.public.excel.programming)