Knowledge of three-dimensional protein structure is indispensable in biomedical research. Protein structure and function are intimately linked, and thus structure facilitates drug discovery, aids investigations of protein-protein interactions, informs mutagenesis analysis, guides protein engineering and the design of new proteins, and provides a foundation for understanding the molecular basis of disease. However, the number of protein sequences available in the genomic era far exceeds the capacity of the main experimental structure determination techniques of X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, resulting in a substantial sequence- structure gap. We address this ever-widening gap by developing and disseminating novel protein structure modeling tools. This renewal project is a new collaboration between experts in computational modeling (Cheng) and experimental structural biology (Tanner). We plan to develop innovative, integrated machine learning (e.g., deep learning), data mining and statistical modeling methods to address major challenges in both template-based structure modeling and template-free (ab initio) structure modeling. We will apply these tools to enzymes in the aldehyde dehydrogenase (ALDH) superfamily, a group of enzymes that are involved in numerous important biological processes and implicated in many diseases due to mutations. The ALDH models will be experimentally validated using X-ray crystallography and biochemical assays. Furthermore, we will combine the modeling power of our structural Input-Output hidden Markov model with experimental small- angle X-ray scattering (SAXS) to predict the tertiary structures of large multi-domain proteins. The integration of computational and experimental sciences in this project positions us uniquely in structure modeling space.