A detailed and accurate understanding of the structure of proteins is one cornerstone of modern biomedical research, and an explicit goal of the NIH is to define the structure of all proteins either by accurate experimental determination or comparative model-building. The most successful structure prediction approaches employ empirical knowledge-based energy terms derived from features of known protein structures - most notably single-residue ???-distributions, backbone-dependent side chain rotamer preferences, and tight packing criteria. One known unrealistic feature of these prediction programs is the assumption of a fixed ideal geometry for the backbone. The driving hypothesis behind this proposal is that there exists a largely unappreciated but real, systematic, significant and pervasive variation in backbone bond angles and peptide planarity that occurs as a function of backbone torsion angles, and accounting properly for this variation will be required to achieve X-ray crystal structure quality for comparative models. The overall goal of this work is to generate accurate empirical values for this covalent variation that will lead to tangible improvements in the accuracy of structures produced by comparative modeling and de novo structure prediction as well as by X-ray crystallography. We propose to achieve this overall goal by pursuing the following three specific aims: 1) to design, develop, and make available a flexibly-searchable database containing bond lengths, bond angles, and torsion angles for all structures known at better than 1.75 [unreadable] resolution (currently ~500,000 residues);2) to use conventional query-based and modern machine learning approaches to derive accurate empirical information from the database about the systematic correlation of local conformation with variations in covalent geometry;and 3) to create a modular conformation-dependent expected covalent geometry library and to facilitate its incorporation into leading applications for comparative and crystallographic protein structure modeling. With the dramatically increased number of ultrahigh-resolution resolution crystal structures now known, the time is ripe for construction of this Protein Geometry Database that will provide facile access to a massive treasure trove of reliable and detailed empirical information about protein structure. To be done well, this work will require painstaking attention to detail and an intimate familiarity with the limitations of crystallographic refinement and the principles of protein structure. Dr. Karplus is well-suited to lead this work as he has a 20+-year track record of quality crystallographic structure determinations combined with contributions of more general insights into protein structure, among them being the pioneering characterization of the conformation-dependent variations in covalent geometry that serves as this project's foundation. Collaborations with world-leading groups in structure prediction, in crystallographic refinement and structure validation, and in knowledge-based library development ensure a rapid and effective translation of the gleaned information into improvements in protein modeling. PUBLIC HEALTH RELEVANCE: Proteins are responsible for carrying out most of the processes of life and their function depends exquisitely on their structure, even on the tiniest structural details. For this reason, determining accurate structures of proteins is a cornerstone of modern biomedical research. This work is aimed at leading to a universal improvement in the accuracy with which protein structure can be built.