Re: convert protein fasta stream into harsh table



zhong.huang@xxxxxxxxx wrote:
hi,

Can anyone suggest me a simple way to convert multiple sequences fasta
(in Bio::SeqIO object) into harsh table (sequence annotation as key,
sequence as value)?

They are called HASH tables, not HARSH tables.

The fasta file looks like this:


gi|9049352|dbj|BAA99407.1| 3-methylcrotonyl-CoA carboxylase biotin-containing subunit [Homo sapiens]

MAAASAVSVLLVAAERNRWHRLPSLLLPPRTWVWRQRTMKYTTATGRNITKVLIANRGEIACRVMRTAKKLGVQTVAVYSEADRNSMHVDMADEAYSIGPAPSQQSYLSMEKIIQVAKTSAAQAIHPGCGFLSENMEFAE


gi|4504067|ref|NP_002070.1| aspartate aminotransferase 1 [Homo sapiens]

MAPPSVFAEVPQAQPVLVFKLTADFREDPDPRKVNLGVGAYRTDDCHPWVLPVVKKVEQKIANDNSLNHEYLPILGLAEFRSCASRLALGD

I want to have the harsh table %seqharsh to hold sequences like this:

# my %seqharsh = ('seq1', MAAASAVSVL......',
# 'seq2', MAPPSVFAEVPQ......,);

I'm not seeing where the 'seq1' and 'seq2' values are coming from in your input. If I'm allowed to make up hash keys, the problem is pretty simple.

My code is like this:


my $seqio = new Bio::SeqIO(-format => $format,
-file => $file);

my %seqharsh; # declare the hash table

while ( my $seq = $seqio->next_seq ) {
if( $seq->alphabet ne 'protein' ) {
confess("Skipping non protein sequence...");
next;
}

#write code here to assign each entry into harsh %seqharsh
my $seqharsh{$seq->primary_id} = $seq->seq();

You neither need nor want the 'my'; this will add items to the hash:
$seqharsh{$seq->primary_id} = $seq->seq;
.