Re: Pattern matching problem
From: Wolf Blaum (blaum_at_uthscsa.edu)
Date: 02/26/04
- Next message: B McKee: "subroutine placement (Layout conventions)"
- Previous message: Tim: "Re: newline or CR with join function"
- In reply to: Henry Todd: "Re: Pattern matching problem"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 26 Feb 2004 19:51:47 +0100 To: Henry Todd <h.j.todd@cs.rhul.ac.uk>, beginners@perl.org
On Thursday 26 February 2004 12:28, Henry Todd generously enriched virtual
reality by making up this one:
> On 2004-02-26 00:43:21 +0000, blaum@uthscsa.edu (Wolf Blaum) said:
> > As I understand Biology, there is 4 nucleotid acids which gives 4**2
> > combinaions for dupplets. So you need 8 vars to count the occourence of
> > all douplets. Worse for triplets. (24)
> > As I understand genetics, triplets are what matters, since the rma
> > transcriptase reads triplets as code of amino acids. You might give my
> > updates un my biol. knowledge:-)
>
> Wolf -
>
> It's been a while since my A-Level biology days, but I believe you're
> correct. However, this particular coursework was to create two programs
> for a different purpose than I think you're imagining:
Hi,
as you can tell form my mail it has been a while since my basic math classes,
too: 4**2 =8? 4**3=24? Uhuh...
However, the real bug was
for (my $i=0;$i < length($sequence) - $wordsize;$i++){
which should be
for (my $i=0;$i <= length($sequence) - $wordsize;$i++){
beause it misses the last douplet/triplet/... otherwise.
> transition.pl: returns tables of transition probabilities for plus and
> minus models (exon and non-exon regions) as well as beta values
> (log-odds ratios) to compare the two models.
>
> The transition probability for AT for example (the probability that
> adenine will be followed by thymine) is calculated thus:
>
> tp(AT) = |AT| / |A_|
>
> The total number of occurrences of "AT" divided by the total number of
> "A" followed by anything.
>
> The program can also write the transition probabilities to a file to be
> used as input for the other program...
ok - but once you end up with a hash containing all the douplets as there keys
and frequency as values that should be doable as long as you know the members
of your alphabet.
I dont know if there is such a thing as transition probabilitis for codons (ie
triplets) as well - if there is, then this should manifest as transition
probilities for amino accids. In that case, creating the hash of wmers is
done by just feeding the script another sequence. The only thing to change
would be add knowledge about the AA alphabet to your script.
> simulation.pl: which asks the user to specify the length of the
> sequence they want, then generates it according to the model file used
> as input (by simulating a Markov chain). So if you supply a file
> containing the transition probabilities of a typical exon (coding)
> region, the simulation will use them to generate a typical exon
> sequence.
This gets really of topic:
Just interested: How do you choose which Letter to start with since there is
no tp for nothing folowed by whatever?
Sounds like a fun problem:)
G'day, Wolf
- Next message: B McKee: "subroutine placement (Layout conventions)"
- Previous message: Tim: "Re: newline or CR with join function"
- In reply to: Henry Todd: "Re: Pattern matching problem"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|