Knowledge-based biomedical data science In the previous funding period, we designed and constructed breakthrough methods for creating a semantically coherent and logically consistent knowledge-base by automatically transforming and integrating many biomedical databases, and by directly extracting information from the literature. Building on decades of work in biomedical ontology development, and exploiting the architectures supporting the Semantic Web, we have demonstrated methods that allow effective querying spanning any combination of data sources in purely biological terms, without the queries having to reflect anything about the structure or distribution of information among any of the sources. These methods are also capable of representing apparently conflicting information in a logically consistent manner, and tracking the provenance of all assertions in the knowledge-base. Perhaps the most important feature of these methods is that they scale to potentially include nearly all knowledge of molecular biology. We now hypothesize that using these technologies we can build knowledge-bases with broad enough coverage to overcome the ?brittleness? problems that stymied previous approaches to symbolic artificial intelligence, and then create novel computational methods which leverage that knowledge to provide critical new tools for the interpretation and analysis of biomedical data. To test this hypothesis, we propose to address the following specific aims: 1. Identify representative and significant analytical needs in knowledge-based data science, and refine and extend our knowledge-base to address those needs in three distinct domains: clinical pharmacology, cardiovascular disease and rare genetic disease. 2. Develop novel and implement existing symbolic, statistical, network-based, machine learning and hybrid approaches to goal-driven inference from very large knowledge-bases. Create a goal- directed framework for selecting and combining these inference methods to address particular analytical problems. 3. Overcome barriers to broad external adoption of developed methods by analyzing their computational complexity, optimizing performance of knowledge-based querying and inference, developing simplified, biology-focused query languages, lightweight packaging of knowledge resources and systems, and addressing issues of licensing and data redistribution.