Project 3 will develop new approaches for analyzing genetic, epigenetic, expression and clinical profiles provided by Projects 1 and 2, and Core B as well as related external databases. For gene discovery and outcome prediction, we will develop methods of analysis and employ our approaches to select features for a colon cDNA microarray. A second analysis will identify genes for DNA and functional profiling. By relating gene expression profiles, DNA markers, and clinical progression of lesions from ACF to metastasis, we will predict the outcome of a tumor and discover new cancer pathway genes. Our specific aims are: (i) To determine which molecular alteration(s) identified in Project 1 and Project 2 correlate with recurrence and survival. Post-surgical prognosis is related to the development of distant metastasis. We will use biostatistical methods to correlate genetic, epigenetic, and expression changes with disease free survival, pattern of recurrence, and disease specific survival following potentially curative resection of Stage II/III primary cancers. (ii) To develop a molecular taxonomy of colorectal cancer by relating concerted patterns of gene expression to clinical and genetic information through cluster analysis. This aim will develop unsupervised and partially supervised methods, which can identify unanticipated structure in the data. These methods complement classical statistical analysis. One of the most difficult problems with classification and clustering analysis is the multiplicity of data types. We will develop methods to overcome this problem. (iii) To quantify relationship between the genotype and phenotype of colorectal cancer using neural network analysis. We will develop analytical techniques for evaluating gene-tumor relationships using supervised neural networks, and we will apply these techniques to gene class discovery, gene class prediction, any functional modeling in concert with Projects 1 and 2. We will conduct an exploratory study of the extension of neural network models to gene regulatory networks that describe disease dynamics. (iv) To provide data management for Projects 1 and 2, including cDNA handling and testing and scoring of samples, We will combine data from Projects 1 and 2 and Core B into a central DataMall (Princeton University). Data coordinators at MSKCC and Cornell will transfer information to the DataMall. These data will be available to researchers and later to the public through a WWW interface. There will be integral links to protein, genetic, expression, and cancer pathway databases.