Increasingly large and diverse data sets are being generated by publically funded screening centers using various high- and low-throughput screening technologies. Much of this data is accessible. The largest public repository of small molecule screening results is PubChem, currently covering over 1,500 assays for 370,000 compounds. The number of publically available assays is expected to grow more than 10 fold during the next five years. The utility of this invaluable resource is currently limited, because the knowledge contained in complex and diverse bioassay data sets is not formalized and therefore cannot be accessed for comprehensive computational analysis or integration with other data sources. This proposal is to attack this limitation. For the past ten years ontologies have been developed by biologists to facilitate the analysis and discussion of the massive amounts of information emerging from the various genome projects. An ontology is a controlled vocabulary representation of the objects and concepts and their properties and relationships. The purpose is to model and share domain-specific knowledge so that software agents can automatically extract and associate information. The aim of this proposal is to develop a bioassay ontology, software tools, and to demonstrate their utility. The bioassay ontology will coherently describe diverse biological assays (such as those in PubChem) with a focus on complex cell-based assays and in particular high-content screening data. Software support and development includes modules to build ontology terms and to curate data sets, tools to map the ontology onto screening experiments and other ontologies, and tools to standardize, reformat, and aggregate data sets in the context of the ontology. We will demonstrate the utility of our approach by creating a PubChem-derived database and making it available to the community via a search interface. The ontology and software tools will facilitate the analysis of bioassay screening data in various contexts, for example signaling or metabolic pathways and indirectly human disease. The tools will enable one to extract data sets for modeling specific interactions between perturbing agents and biological targets (or pathways), or to model assay technology-dependent interferences. End user software needs to provide ease of use for biologists and chemical biologists to utilize the ontology in the context of their own and external data sets. It will be modular and open source. We will develop various collaborations to disseminate the bioassay ontology and software in the community and to facilitate their ongoing development. PUBLIC HEALTH RELEVANCE: This project will develop a bioassay ontology to coherently describe the hundreds of different assays used to study how perturbing agents, such as drugs, alter cell function. Along with new software to search existing assay databases, this will enable scientists to more effectively identify and prioritize chemicals for further development into chemical probes or starting points for therapeutics.