The primary goal of this project is to develop a novel, integrated approach for the analysis of high-throughput cancer genomic data. We plan to develop new variable selection methods for 1) class discovery, that is we propose to determine subgroups of the specified cancer to better understand the underlying cancer biology and 2) predictive gene signatures, that is we propose to determine a subset of genes which are predictive for patients'clinical phenotypes, including survival and response to therapy. Specifically, we will develop a new method for variable selection in clustering. Clustering plays a critical role in the analysis of genomic cancer data. For example, based on the gene expression proles, important cluster distinctions can be found among a set of tissue samples, which may reect categories of diseases, mutation status, or different responses to a given therapy. Second, we will develop a new penalized-likelihood method for variable selection in regression which utilizes group information to select groups of correlated genes that share the same biological pathway. The developed methodology will be useful for identifying important gene signatures that may lead to more selective personalized treatment in any health studies where survival time or response to therapy is of interest.