PROTDIST

PROTDIST computes a distance measure for protein sequences, using maximum likelihood estimates based on the Dayhoff PAM matrix, the JTT matrix model, the PBM model, Kimura's 1983 approximation to these, or a model based on the genetic code plus a constraint on changing to a different category of amino acid. The distances can also be corrected for gamma-distributed and gamma-plus-invariant-sites-distributed rates of change in different sites. Rates of evolution can vary among sites in a prespecified way, and also according to a Hidden Markov model. The program can also make a table of percentage similarity among sequences. The distances can then be used in the distance matrix programs. Part of Phylip.

Manual: http://evolution.genetics.washington.edu/phylip/doc/protdist.html

TEST DATA SET

(Note that although these may look like DNA sequences, they are being treated as protein sequences consisting entirely of alanine, cystine, glycine, and threonine).

   5   13
Alpha     AACGTGGCCACAT
Beta      AAGGTCGCCACAC
Gamma     CAGTTCGCCACAA
Delta     GAGATTTCCGCCT
Epsilon   GAGATCTCCGCCC

CONTENTS OF OUTPUT FILE (with all numerical options on )

(Note that when the numerical options are not on, the output file produced is in the correct format to be used as an input file in the distance matrix programs).

 
  Jones-Taylor-Thornton model distance
 
Name            Sequences
----            ---------
 
Alpha        AACGTGGCCA CAT
Beta         ..G..C.... ..C
Gamma        C.GT.C.... ..A
Delta        G.GA.TT..G .C.
Epsilon      G.GA.CT..G .CC
 
 
 
Alpha       0.000000  0.330447  0.625670  1.032032  1.354086
Beta        0.330447  0.000000  0.375578  1.096290  0.677616
Gamma       0.625670  0.375578  0.000000  0.975798  0.861634
Delta       1.032032  1.096290  0.975798  0.000000  0.226703
Epsilon     1.354086  0.677616  0.861634  0.226703  0.000000