PROJECT SUMMARY Research Project: Chronic lung diseases account for over 100,000 deaths a year in the United States and the pathogenetic underpinnings of the diseases are known to be very heterogeneous which underscores the importance of identifying disease subtypes. In recent years, integrative analysis of multi-omics data including the transcriptomic gene expression and exome/genome wide DNA sequence variants data have successfully identified molecular subtypes of many diseases that can predict a patient's response to cytotoxic and biologic treatments. This promises a future of biomarker driven personalized treatments that will improve outcomes, reduce toxicity, and reduce cost. In contrast, similar analysis for chronic lung diseases has not been realized. This is in part due to the complexity of the genetic and transcriptomic perturbations that contribute to these types of diseases. In addition, not all exomic/genome disease relevant sequence variants are expressed in a chronically diseased organ, making much of the exomic/whole genome sequencing data non-relevant in any specific disease. Our team has been analyzing large-scale transcriptomic data to identify disease heterogeneity, and our previous studies in asthma suggested that integrative analysis of the longitudinal transcriptional data and the genetic sequence variants data from the same subjects will significantly enhance discovery in chronic inflammatory diseases and specifically lung diseases. In addition, the pre-defined biological pathway information can significantly reduce the data dimension and enrich for signals for molecular endotypes of lung diseases. Taken together, we hypothesize that integrative analysis of longitudinal gene expression and genetic sequence variation in lung derived RNAs combined with prior pathway information will identify disease heterogeneity that has stronger association with important disease clinical features than those identified by integrating gene expression and prior pathway information only. To examine this hypothesis, we propose to 1) develop disease specific methods to identify sequence variants from lung tissue derived longitudinal RNA sequencing data; and 2) develop novel statistical models to integrate the genetic sequence variants, the longitudinal transcriptional signatures from the same dataset and the biological pathways to identify endotypes of chronic lung diseases, including asthma and sarcoidosis. Environment and Collaborators: I will be working on the proposed research together with Dr. Hongyu Zhao and collaborating with Drs. Geoffrey L. Chupp and Naftali Kaminski, a team of experienced, committed experts in the fields of statistical genomics and genetics, pulmonary medicine and translational research. This team has demonstrated collaborative success and each member brings unique expertise. The data sets for our main study populations will be mainly generated in Dr. Chupp's and Dr. Kaminski's labs at the PCCSM section. Validation of the discoveries and downstream functional studies will also be conducted in the Chupp and Kaminski labs.