The long term goal of our research is to understand the flow of information from the genome to the phenotype of organisms. In this proposal, we will attempt to use Bayesian networks and near-optimal sequence alignments to represent protein secondary structures and motifs. A Bayesian network describes the likelihood of amino acids at each position in a motif as well as the dependence of amino acids in one position on the amino acids at other position. Hence, Bayesian networks can describe both the conservation of amino acids at single positions and the conservation of correlations between two positions simultaneously. Conserved amino acids result from evolutionary selection for a specific amino acid or type of amino acid at one position in a protein structure. These positions often have important functional or structural requirements. Correlated changes between amino acids generally result from side-chain side-chain interactions between pairs of amino acids in a protein's structure. The types of correlations we have represented with Bayesian networks include electrostatic charges, hydrophobicity, hydrogen- bond donor and acceptor and inversely correlated packing volumes among others. These Bayesian networks can be used to 1) discover side-chain side--chain interactions within protei motifs and 2) to search sequence databases for motifs showing both correlations and conserved amino acids. Near-optimal alignments between two sequences can display regions that have been more highly conserved or less highly conserved using the information contained in only two sequences. The most highly conserved region correspond to the most highly structured regions and the most highly variable regions correspond to loops and coils and other hypervariable regions. We propose to use near-optimal alignments to display conserved secondary structures of proteins and hypervariable regions. We will use secondary-structure specific amino acid substitution matrices to provide specificity. The goals of this proposal are to 1) build a database of Bayesian networks that represent protein motifs, 2) test these networks for their ability to detect motifs using test sets and crossvalidation methods, 3) compare these networks with other methods for searching protein databases , 4) build an integrated set of Bayesian networks to predict protein secondary structure, 5) compare the prediction of protein secondary structure with existing method 6) build a near-optimal sequence alignment workbench, and 7) predict structured and unstructured regions in proteins from near- optimal alignments.