Modeling three-dimensional structure of protein molecules is of clear biomedical importance, driven by two powerful forces. First is the realization that proteins carry out almost all essential functional and structural tasks in living systems by virtue of their folded shape; almost all drugs depend on a small molecule inhibiting a malfunctioning protein through shape complementarity in three-dimensions. Second is the rapid determination of genomic protein sequence data, doubling in the past 28 months, and complemented by equally rapid determination of novel protein structural data; structural coverage of sequences (percentage of sequences with some structural information) is over 50% and is increasing thanks to structural genomics initiatives. This proposal continues previous aims by developing and improving methods for accurate homology modeling (have known structure of a related sequence). Current aims extend to the general problem of ab initio structure prediction (no structure of any related sequence). Such extension is possible due to recent progress and a realization that both homology modeling and ab initio structure prediction share a common philosophy rooted in the decoy / discriminate paradigm we pioneered in 1995. Specifically, both protein modeling and structure prediction have four inter-related stages: (a) Formulation of energy functions, (b) Application of move sets, (c) Generation of decoy structures and (d) Assessment of predicted structures. These four steps are iterated to improve both decoys and energy functions so as to obtain ever better predicted structures. Analysis of experimentally determined sequences and structures goes hand in hand with this planned modeling to give as an over-view of the extent of the problem and the progress made in the field. Drawn to such an analysis in the previous funding period, we expect to continue this activity with particular focus on the 'dark matter', those sequences for which we have least information. We are well-aware that these are ambitious aims but are encouraged by recent progress. Our methodology uses knowledge-based or statistical energy functions, but our philosophy is very rooted in the physical nature of the systems. As such, our work will have far-reaching applications to theoretical studies of molecular function including ligand binding modeling, protein-protein interaction modeling and more general simulation of protein function. Our five specific aims are: (1) Better knowledge-based energy functions, (2) General and novel move sets, (3) Decoy generation by uniform sampling and powerful search and (4) Assessment of structures to reveal deficiencies and (5) Analysis of uncharacterized sequence in terms of clustering sequence domains into new families. Achieving these aims will advance our fundamental understanding of the molecular structure: predicted molecular structure can guide experiments and lead to further understanding of molecular mechanisms. PUBLIC HEALTH RELEVANCE: Modeling three-dimensional structures of protein molecules is of clear biomedical importance: (1) proteins carry out almost all essential functional and structural tasks in living systems by virtue of their folded shape (almost all drugs depend on a small molecule binding to and inhibiting a malfunctioning protein through shape complementarity in three-dimensions); and (2) the rapid growth of genomic protein sequence data, doubling in the past 28 months. This proposal continues previous aims by developing improved methods for accurate homology modeling (have known structure of a related sequence) and also extends the aims to the general problem of ab initio structure prediction (no structure of any related sequence).