DESCRIPTION (Taken from the Application): The development of a Northeast Structural Genomics Consortium for evaluating the feasibility, costs, economies of scale, and value of structural genomics is proposed. The primary goal of the pilot project is to develop integrated technologies for high-throughput (HTP) protein production and 3D structure determination. These would form key components of the scientific infrastructure required for the next stage of the genome project. The principal components of the project include the design, development, and testing of: i) bioinformatics methods for clustering and prioritizing candidate proteins for biophysical analysis; ii) biotechnologies and robotic methods for htp expression plasmid construction, expression screening, and protein production; iii) htp methods for analysis of "foldedness" of expressed proteins by biophysical techniques including NMR and circular dichroism spectroscopies; iv) cost efficient production of 13C, 15N, 2H, and/or SeMet enriched protein samples suitable for NMR or crystallographic analysis; v) experimental and theoretical methods for parsing large multidomain proteins into domain encoding segments; vi) robotic methods for htp protein crystallization screening, solubility screening, and crystal manipulation, vii) new NMR pulse sequences providing rapid collection of data suitable for high-resolution 3D structure determination; viii) automated computational methods for determining 3D protein structures from raw diffraction and/or NMR data; ix) methods for analyzing 3D protein structures in order to develop testable hypotheses regarding biochemical functions; x) development of integrated project databases to keep track of the reagents and information generated in a large-scale structural genomics effort; and xi) organized dissemination of the resulting expression plasmids, protein reagents, software, resonance assignments, functional annotations, and 3D structures. A unique feature of this project will be the combination of strong efforts in both X-ray crystallography and solution-state NMR spectroscopy, allowing a rigorous and parallel evaluation of the complementary values of these methods for genomic-scale structure analysis. The primary genome targets for this methodology development will be eukaryotic model organisms which are subjects of extensive functional genomics research, including S. cerevisiae, C. elegans, D. melanogaster, and homologous human proteins. It is likely that genes that are conserved across this set of genomes are biologically and/or biomedically important. It will also be valuable to correlate the structural and biochemical function analysis of these proteins with the extensive biological data emerging from large-scale functional genomics efforts. Within these genomes, the pilot project will focus on proposed open reading frames (ORFs) encoding phylogenetically conserved polypeptide chains of less than 340 amino acids with no predicted 3D structures.