PROTDIST computes a distance measure for protein sequences, using maximum likelihood estimates based on the Dayhoff PAM matrix, the JTT matrix model, the PBM model, Kimura's 1983 approximation to these, or a model based on the genetic code plus a constraint on changing to a different category of amino acid. The distances can also be corrected for gamma-distributed and gamma-plus-invariant-sites-distributed rates of change in different sites. Rates of evolution can vary among sites in a prespecified way, and also according to a Hidden Markov model. The program can also make a table of percentage similarity among sequences. The distances can then be used in the distance matrix programs. Part of Phylip.
(Note that although these may look like DNA sequences, they are being treated as protein sequences consisting entirely of alanine, cystine, glycine, and threonine).
5 13 Alpha AACGTGGCCACAT Beta AAGGTCGCCACAC Gamma CAGTTCGCCACAA Delta GAGATTTCCGCCT Epsilon GAGATCTCCGCCC |
(Note that when the numerical options are not on, the output file produced is in the correct format to be used as an input file in the distance matrix programs).
Jones-Taylor-Thornton model distance Name Sequences ---- --------- Alpha AACGTGGCCA CAT Beta ..G..C.... ..C Gamma C.GT.C.... ..A Delta G.GA.TT..G .C. Epsilon G.GA.CT..G .CC Alpha 0.000000 0.330447 0.625670 1.032032 1.354086 Beta 0.330447 0.000000 0.375578 1.096290 0.677616 Gamma 0.625670 0.375578 0.000000 0.975798 0.861634 Delta 1.032032 1.096290 0.975798 0.000000 0.226703 Epsilon 1.354086 0.677616 0.861634 0.226703 0.000000 |