The primary goal of this project is to develop a novel, integrated approach for the analysis of high-throughput cancer genomic data. We plan to develop new variable selection methods for 1) class discovery, that is we propose to determine subgroups of the specified cancer to better understand the underlying cancer biology and 2) predictive gene signatures, that is we propose to determine a subset of genes which are predictive for patients'clinical phenotypes, including survival and response to therapy. Specifically, we will develop a new method for variable selection in clustering. Clustering plays a critical role in the analysis of genomic cancer data. For example, based on the gene expression profiles, important cluster distinctions can be found among a set of tissue samples, which may reflect categories of diseases, mutation status, or different responses to a given therapy. Second, we will develop a new penalized-likelihood method for variable selection in regression which utilizes group information to select groups of correlated genes that share the same biological pathway. The developed methodology will be useful for identifying important gene signatures that may lead to more effective personalized treatment in any health studies where survival time or response to therapy is of interest. PUBLIC HEALTH RELEVANCE: Our project aims to develop a new class of variable selection methods for analyzing high-throughput cancer genomic data. Compared to existing methods, the proposed methods will lead to more powerful methods of class discovery for identifying cancer sub-types and more accurate prediction of patients'survival and response to therapy. The developed methodology will be useful for identifying important gene signatures that may lead to more effective personalized treatment in any health studies where survival time or response to therapy is of interest.