The goal of this proposal is a molecular level description of protein unfolding, and by extension folding, using realistic molecular dynamics (MD) simulations in solution. Both the general and sequence-specific rules of unfolding will be pursued. The general rules will be investigated by making use of a large database of protein unfolding trajectories that already exist in the lab. In addition, new trajectories will be added. So far, this database contains nearly 11,000 simulations of more than 2200 protein and peptide systems. This repository represents the largest collection of protein simulations and protein structures in the world. The simulations were designed so that representatives of all proteins folds will eventually be investigated, working from the most to least populated folds. The current set represents over 80% of all known protein structures. We have already developed a novel relational/multidimensional database to house these data. Specific Aim 1 of this proposal seeks to determine the general rules of protein unfolding by mining this database. In addition, multiple representatives of highly populated folds are being investigated to determine sequence-specific effects in Specific Aim 2. Our hypothesis is that all-atom molecular dynamics simulations of isolated proteins in solution can provide continuous and realistic protein unfolding pathways and that the general rules for unfolding and folding can be determined once a large number of protein folds have been simulated. While most relatives within a fold family fold by the same mechanism based on experimental studies, there are some exceptions. Consequently, sequence-specific effects will be determined by investigating multiple members of four common fold families with different architectures. Specific Aim 3 focuses on the unfolding of structural motifs in isolation and in different structural contexts, i.e., within different structures. Finally, Specific Aim 4 focuses on characterizing the unfolding behavior and sequence determinants of unfolding for a pair of designed proteins with high sequence identity but they adopt different folds with different functions. PUBLIC HEALTH RELEVANCE: Protein folding remains one of the most important unsolved problems in molecular biology, and it represents an important missing link necessary for full utilization of the information becoming available from the mapping of genomic sequences. Characterization of the unfolding process is equally important, both from the perspective of fully understanding a fundamental biochemical phenomenon and for the light shed on the folding process. An understanding of protein folding/unfolding also has important implications for all biological processes, including protein degradation, protein translocation, aging, and many human diseases, including amyloid diseases and single-nucleotide polymorphism associated disorders.