A holy grail of bioinformatics is the creation of whole-cell models with the ability to enhance human understanding and facilitate discovery. To this end, a successful and widely-used effort is the Gene Ontology (GO), a massive project to manually annotate genes into terms describing molecular functions, biological processes and cellular components and provide relationships between terms, e.g. capturing that small ribosomal subunit and large ribosomal subunit come together to make ribosome. GO is widely used to understand the function of a gene or group of genes. Unfortunately, GO is limited by the effort required to create and update it by hand. It exists only for well-studied organisms and even then in only one, generic form per organism with limited overall genome coverage and a bias towards well-studied genes and functions. It is not possible to learn about an uncharacterized gene or discover a new function using GO, and one cannot quickly assemble an ontology model for a new organism, let alone a specific cell-type or disease-state. This proposed research will change this state of affairs. Already, work has shown that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to computationally infer an ontology whose coverage and power are equivalent to those of the manually-curated GO Cellular Component ontology. Still, this first attempt was limited in the types of experimental data used and its ability to infer the more generally useful Biological Process ontology. Here machine learning approaches will be applied to integrate many types of experimental data into ontology model construction and analyze the type of biological information provided by each experiment, revealing those experiments most informative for capturing Biological Process information. Furthermore, the high-throughput experimental data to ontology paradigm explored here will be used to develop a computational tool to highlight novel types of hypotheses that are inaccessible by current high-throughput experimental data analysis methods. Preliminary work has shown GO to be useful for prediction of synthetic lethal pairs of genes, i.e. genes that are individually non-essential but when knocked out together cause cell death. Given the high mutation rate in cancer, these pairs provide potential cancer drug targets, as a drug may target a gene product which is now essential in the mutated cancer cells but not other cells, thereby killing only cancer cells. Because data-driven ontologies are not as hindered by issues with bias and coverage and are specifically designed to capture only functional relationships, this proposal will explore the idea that data-driven ontologies will be better suited to help predict synthetic lethal pairs than GO. To this end, algorithms will be developed to construct a data-driven ontology of yeast DNA repair and use this ontology to predict synthetic lethal pairs of genes. Overall, this proposal will develop the computational and experimental roadmap to construct a whole-cell model of gene function - an ontology - and use the model to discover useful biology - synthetic lethal pairs.