Osteoarthritis (OA) is highly prevalent, contributes to substantial morbidity in the population, and lacks effective interventions to prevent onset and progression. Importantly, and like many other chronic conditions, OA is not a single disease but rather a heterogeneous condition consisting of multiple subgroups, or phenotypes, with differing underlying pathophysiological mechanisms. It is becoming increasingly clear that consideration of specific OA phenotypes in clinical studies and trials is critically needed to move the field forward. The overall goal of this line of work is to identify and understand potential phenotypes of knee osteoarthritis (KOA) to better inform future research efforts and treatments; this exploratory R21 project using OA Initiative (OAI) data will investigate novel methodology to support phenotyping in KOA. Successful treatments for OA will need to be targeted to, and tested in, specifically chosen OA phenotypes. Our hypothesis is that an understanding of KOA phenotypes, a key step toward Precision Medicine in OA, will lead to more successful clinical studies in the long-term. To approach this important clinical problem, we propose a project in which we will apply innovative machine learning methods and validation strategies to data from the large, publicly available OAI cohort. We will leverage this large dataset, along with local expertise in statistics, biostatistics and machine learning methodology, to tackle the problem of phenotyping this heterogeneous disease. In Aim 1, we will utilize a data-driven, unsupervised learning approach, to cluster features that best define and discriminate among phenotypes of KOA in the OAI dataset, using biclustering and a novel significance test (SigClust) developed by co-I Marron. For Aim 2, we will test specific hypotheses of relevance to OA outcomes, such as differences between those with and without OA, or those who do or do not develop new or worsening disease, using another set of machine learning methods (Direction-projection-permutation [DiProPerm] hypothesis testing, and Distance-Weighted Discrimination [DWD]), also developed by co-I Marron, in the full cohort and in any identified clusters from Aim 1. In order to address these aims, this proposal involves interdisciplinary collaborations among experts in statistics, biostatistics, computer science, rheumatology, and epidemiology. This work will significantly impact the field by fulfilling a critical need to accurately define OA phenotypes, discover the key features associated with these phenotypes, link phenotype subgroups to underlying mechanisms and use this information to inform and focus future clinical studies. In the long term, we expect that this strategy will lead to more personalized and successful management of the millions of people affected by OA.