PROJECT SUMMARY/ABSTRACT Today's healthcare infrastructure supports the production and storage of clinical data on a massive scale. A central goal in clinical informatics is to leverage these data to improve our understanding of health and disease. However, a major challenge is the paucity of reliable disease labels in observational data. Disease phenotypes address this issue by summarizing the characteristics of specific diseases in terms of commonly observed clinical variables. Classically, disease phenotypes are engineered via a manual expert-driven approach which fails to scale to large numbers of diseases. Data-driven methods for disease phenotyping aim to obtain large numbers of disease phenotypes by directly modeling large-scale observational clinical data. Such high-throughput methods may scale, but generally cannot guarantee identifiability; that is, inferred phenotypes are not guaranteed to map to specific diseases. In addition, data-driven disease phenotyping methods generally model phenotypes independently with no effort to capture relationships among diseases which would be consistent with our understanding of comorbidities, disease progression trends, and disease type/subtype relationships. The long-term goal of the proposed research is to support large-scale analysis of observational clinical data by introducing a family of closely related models for high-throughput disease phenotyping which resolve the issue of identifiability and model relationships among diseases. My work is inspired by an unsupervised probabilistic graphical model for high-throughput phenotyping, UPhenome. My objective is to derive, implement, validate, and disseminate UPhenome-based models which will 1) process both biomedical knowledge and clinical data to yield identifiable phenotypes and 2) model co-occurrence, temporal, and hierarchical relationships among inferred phenotypes. My central hypothesis is that UPhenome-based models can support large-scale clinical data analysis by inferring phenotypes that effectively represent the clinical characteristics of specific diseases while also capturing common comorbidities (co- occurrence model), patterns of disease progression (temporal model), and organizing diseases into types and subtypes (hierarchical model). To test this hypothesis, I propose the following aims. Aim 1: I describe Guided UPhenome, a model which process biomedical knowledge and clinical data to yield identifiable phenotypes. The model's capacity for capturing disease-specific traits is evaluated qualitatively by clinical experts, and quantitatively in disease-specific cohort selection tasks versus a gold-standard and a competing algorithm. Aim 2: I detail extensions to UPhenome which allow for modeling of disease relationships. The meaningfulness of these relationships is evaluated qualitatively using a series of custom ?intrusion tasks? inspired by the topic modeling literature. Aim 3: I will disseminate UPhenome-based models by ensuring their compatibility with the Observational Medical Outcomes Partnership (OMOP) common data model, and promoting their adoption within the Observational Health Data Sciences and Informatics (OHDSI) community.