In recent years there have been exciting breakthroughs in the application of computational methods to problems in cancer genomics. Machine learning techniques applied to gene expression data have been used to address the questions of distinguishing tumor morphology, predicting post-treatment outcome, and finding molecular markers for disease. While these studies have been very promising, significant challenges remain. Extracting biological knowledge from microarray-based gene expression data is difficult. The development of robust and accurate expression-based classifiers of biological and clinical states is similarly problematic. Biologists do not have access to an integrated set of robust, sophisticated analytical tools. Our goal is to develop, implement, and distribute computational genomics methods that address these challenges in the gene expression profiling field. Aim 1: Capture the behavior of a set of genes representing a pathway or state of the cell to reduce a list of thousands of expressed genes into a few hundred metagenes. Metagenes should filter the noise, technical variation, and idiosyncrasies of the data, and capture the actual molecular logic or relevant biological correlations and structure in the data. Aim 2: Develop a robust and validated computational methodology for using metagene markers for classification. Aim 3: Develop and distribute an integrated software package, GenePattern, to put the power of sophisticated computational methods into the hands of the biomedical research community. Our extensive experience developing methods, analyzing patient sample data, and creating and distributing software tools for this area of research makes us well suited to carry out this program.