Despite their adverse impact on patient quality of life and healthcare utilization and costs, symptom clusters (SCs) in common adult chronic conditions such as cancer, heart failure (HF), type 2 diabetes mellitus (T2DM), and chronic obstructive pulmonary disease (COPD) are understudied and poorly understood. The lack of access to real world, longitudinal patient symptom data sets and inability to adequately model the complexity of SCs has greatly limited research. Based on our previous work, we propose that these gaps can be addressed in an innovative way using electronic health records (EHRs) and data science techniques. Our overall objective is to develop, apply and refine, and implement an optimized data processing and analysis pipeline for the characterization of SCs in common adult chronic conditions for use with EHR data. We hypothesize that a core set of SCs is shared among all common adult chronic conditions and that distinct SCs characterize specific conditions and/or treatments. The long term training goal of this project is to assist Dr. Koleck in becoming an independent investigator conducting a program of research dedicated to mitigating symptom burden in patients with chronic conditions through use of informatics and omics (e.g., genomics and proteomics), the focus of her pre-doctoral work. Using exceptional resources available from Columbia University, the K99 phase of this project will focus on the development of a rigorous pipeline; essential competencies in SC analysis and interpretation; and the data science techniques of clinical data mining, natural language processing, machine learning, and data visualization. In the R00 phase, Dr. Koleck will independently implement the pipeline in another medical center to determine the reproducibility of identified SCs and begin to explore clinical predictors (e.g., socio-demographics, laboratory results, and medications) of SCs. The specific aims are to 1) develop a data-driven pipeline for the characterization of SCs from EHRs using a cohort of adult patients diagnosed with cancer, as SCs have been most systematically characterized in this condition; 2) apply the pipeline to three other common adult chronic conditions that share biological and behavioral risk factors with cancer, i.e., HF, T2DM, and COPD, and evaluate SCs in these conditions; and 3) determine if SCs differ for cancer, HF, T2DM, and COPD when implementing the pipeline within another medical center and explore clinically relevant, EHR- documented predictors of identified SCs. To accomplish research aims and training goals, an interdisciplinary team of scientists with expertise in symptom science, biomedical informatics, data science, pertinent clinical domains, and career development mentorship has been assembled. This research is significant because a pipeline that accommodates the format in which symptom data is already being documented in EHRs has the potential to greatly accelerate the acquisition of SC knowledge and expedite clinical translation of symptom mitigation strategies. Given the array of new competencies to be developed, this K99/R00 award is necessary for achieving the candidate?s career goal of advancing chronic condition symptom science.