DNA microarrays are a new and promising biotechnology which allow the monitoring of expression levels for thousands of genes simultaneously. Microarrays are being applied increasingly in biological and medical research to address a wide range of problems, such as the classification of tumors and the study of host genomic responses to bacterial infections. The broad, long-term objectives of this project are to develop novel statistical methods for the design and analysis of DNA microarray experiments. The specific aims of the proposal fall into four areas, all of which are concerned with improving the efficiency and reliability of microarray experiments, from the early design and pre-processing stages to higher level analyses.I. Experimental design. Proper experimental design is essential to ensure that biological questions are answered accurately and precisely given experimental constraints. Flexible designs and methods of analysis will be developed for time-series and multifactorial experiments, which monitor the gene expression response over time for factors such as treatment and cell type. II. Pre-processing. Image analysis and normalization are components of all microarray experiments and can have a substantial impact on higher level analyses. Spot and slide quality statistics will be derived as well as procedures for incorporating these statistics in subsequent analyses. Normalization methods based on robust local regression are proposed to accommodate different types of dye biases and to exploit control sequences spotted on the array. Ill. Pattern discovery and recognition. New methods for clustering, discrimination, and multiple testing are proposed in order to elucidate associations between gene expression levels and other covariates or responses. This includes assessing the effects of treatment interventions, the discovery of temporal or spatial gene expression patterns, and the identification of genes associated with clinical outcomes such as cancer incidence and survival. IV. Software development. Statistical methods developed as part of this project will be implemented in packages built on the R language for statistical computing. To facilitate use and integration with biological information resources a web-browser interface will be provided.