The major aims of this pilot project are the development and critical analysis of methods for high-throughput, proteome-scale, eukaryotic protein production, characterization, and structure determination. This technology will be tested and refined through the determination of crystal and solution state three-dimensional structures of proteins from Arabidopsis thaliana. Particular emphasis will be on proteins whose sequences suggest that they may contain a novel fold, proteins associated with novel functions, or proteins likely to have a known fold but with a function not previously associated with that fold. Target selection will be reordered periodically, by taking into account developments here and elsewhere in fold/function recognition and competitive target selection and in the practicalities of sample preparation and structure determination. It is envisioned that most of the structures will be determined by X-ray crystallography, with NMR used for smaller, more dynamic proteins and for smaller protein targets that fail to crystallize. Membrane proteins, small RNA molecules, and RNA-protein complexes, while not currently amenable to high-throughput methods, will be investigated as minor research targets. Approaches will be explored that promise increased efficiency and lower costs for protein production, characterization, and structure determination. Emphasis will be on process development, with all steps, including administration and costs, under regular review. Automation and robotics will be phased in at steps wherever feasible. Promising new approaches will be competed against ones known to work now. A web-based project management system will: (1) organize data relevant to the overall management of the project, (2) record information on the methodology employed in individual steps, (3) launch computer-controlled processes, (4) harvest intermediate and final results, (5) provide detailed web-accessible views of the project to members of the Center for Eukaryotic Structural Genomics and its Advisory committee, (6) update the publicly-accessible tracking web site, and (7) assist with data deposition and electronic publication of results. All finished structures and other products, such as plasmids, proteins, protocols, and computer software, will be made available to the public. Information learned about dynamics and disordered sequences will be organized in a database. The project will be interfaced with functional proteomics initiatives on A. thaliana to be funded separately. The methods developed by the Center will be applicable to other eukaryotic genomes, including the human genome.