Re: Calculating values of a 2d array by comparison of 2 strings



Thus spoke Mirco Wahab (on 2006-12-15 20:56):

--- 8< --------------

use strict;
use warnings;


After studying Mark Donavans posting (which uses
almost the same approach) I reworked the code
a bit to be more 'perlish' by applying some
idioms found in his code.

I'll attach the stuff here, please give
some hint if thats the correct path to
your problem ...

Regards

Mirco

==> [revisited code]

use strict;
use warnings;

my $seq1 = # human endogenous retrovirus type K (VPK3_HUMAN)
' WASQVSENRP VCKAIIQGKQ FEGLVDTGAD VSIIALNQWP KNWPKQKAVT GLVGIGTASE
VYQSTEILHC LGPDNQESTV QPMITSIPLN LWGRDLLQQW GAEITMPAPL YSPTSQKIMT
KMGYILGKGL GKNEDGIKIP VEAKINQKRE GIGYPF ';

my $seq2 = # VPK12_HUMAN
' WASQVSENRP VCKAIIQGKQ FEGLVDTGAD VSIIALNQWP KNWPKQKAVT GLVGIGTASE
VYQSTEILHC LGPDNQESTV QPMITSIPLN LWGRDLLQQW GAEITMPAPL YSPTSQKIMT
KMGYIPGKGL GKNEDGIKVP VEAKINQERE GIGYPF ';

$seq1 =~ s/[^A-Z_]//g; # straighten out sequences
$seq2 =~ s/[^A-Z_]//g;

# get the shortest sequence length ;-)
my $len = (sort +(length $seq1, length $seq2))[0];

# read the good ole one letter amino acid codes from
# __DATA__ and generate some kind of an index map
my $index = 0;
my %alist = map { (split)[2] => $index++ } <DATA>;
my $acids = join '', sort keys %alist;

my @table; # this table will homology pair statistics

for my $p (0 .. $len-1) { # count statistics
my ($m, $n) = @alist{ substr($seq1, $p, 1), substr($seq2, $p, 1) };
++ $table[$n][$m];
}

# here we are already done with the sequences
# now the fun part, print out matrix
h_legend($acids);
for my $row (0 .. length($acids)-1) {
printf "%-3s", substr($acids, $row, 1);
for my $col (0 .. length($acids)-1) {
printf "%3s", defined $table[$row][$col] ? $table[$row][$col] : ''
}
printf "%3s\n", substr($acids, $row, 1);
}
h_legend($acids);
# done with printing

sub h_legend {
print " ";
printf "%3s",$_ for split '', shift;
print "\n";
}

__DATA__
Alanine ALA A
Cysteine CYS C
Aspartate ASP D
Glutamate GLU E
Phenylalanine PHE F
Glycine GLY G
Histidine HIS H
Isoleucine ILE I
Lysine LYS K
Leucine LEU L
Methionine MET M
Asparagine ASN N
Proline PRO P
Glutamine GLN Q
Arginine ARG R
Serine SER S
Threonine THR T
Serine VAL V
Tryptophan TRP W
Tyrosine TYR Y
GAP _ _
.



Relevant Pages