We propose to make the growing body of experimental three-dimensional (3D) RNA structure data more useful to biomedical researchers by providing improved methods to integrate 3D RNA structure with sequence and other experimental data. New annotation tools and services developed in this project will be integrated into the Nucleic Acid Database (NDB) which will provide a platform for disseminating project results. Among the expected benefits are better methods for 1) predicting 3D structures of functional RNA motifs from sequence, 2) searching for non-coding RNA genes in genomes, and 3) improving alignments of homologous RNA sequences. We focus attention on recurrent, modular RNA 3D motifs, which occur in a wide variety of structured RNA molecules, and which give RNA its distinctive 3D shape. This includes hairpin loops, internal loops, junction loops, and tertiary interaction motifs. We will develop systematic methods to identify, classify, and name recurrent RNA 3D motifs and to define search criteria to reliably find instances of each motif in 3D structures. An annotation procedure will be established so that new motifs are rapidly identified in new structures and vetted in collaboration with other members of the RNA Ontology Consortium. All experimental RNA 3D structures will be annotated with lists of motifs. A Motif Atlas will be created to make information about 3D motif instances in structures available to users. This new Atlas containing the annotation of motifs will be added to the Nucleic Acid Database (NDB), a web resource containing structural and functional annotation of nucleic acid containing macromolecules. An update procedure will be developed such that motif data and Atlas entries will automatically be added to the NDB as new RNA structures become available in the PDB archive. We will extend the query capabilities of the NDB with tools for users to search the NDB for RNA motifs using multiple criteria and to integrate search results with experimental confidence measures. We will maintain statistics on the occurrences of motifs and base pairing interactions, incorporating experimental confidence measures, and make these data available as a resource for refinement and validation tools. Each entry in the Motif Atlas will include a structural alignment of all instances of the motif to reveal sequence variants for each motif, including patterns of insertions and deletions. These data will be combined with statistical covariation data for Watson-Crick and non-Watson-Crick basepairs and statistical data for base-stacking and base- backbone interactions to develop probabilistic models for the sequence variability of each modular RNA 3D motif. These models will be used to deploy a web-based tool for users to find the 3D motif from the Motif Atlas which best matches the sequences of hairpin, internal, or junction loops that they submit. PUBLIC HEALTH RELEVANCE: Recent work shows that most of the human genome is transcribed, most of the produced RNA is non-protein coding, and a large fraction of it is critical for human reproduction, growth, and development. This proposal aims to make the growing body of experimental three-dimensional (3D) RNA structure data more useful to the biomedical research community by providing improved methods to integrate 3D RNA structure with sequence and other experimental data.