Gene expression data, generated by current microarray technology, are a potential source of profound knowledge and insight into the human biological condition. Microarray data are a mass collection of facts and figures that must be organized, summarized, modeled, analyzed and interpreted to yield useful conclusions. The goal of this proposed research program is to develop statistical thinking and methods that will allow useful scientific conclusions to be drawn from gene expression data. This investigation will consider the metric and distributional properties of expression measurements. Appropriate data transformations will be considered, as well as imputation methodology for missing and sub-threshold measurements. Sound techniques for handling background noise in the measurement process will be developed. We will develop methods for managing massive genetic expression data sets. We will tailor and apply sound data mining methods to these data for the discovery of characteristics and relationships of potential scientific value. We shall use various data mining methods such as classification, regression, dependency modeling, clustering and graphical techniques to study gene expression data. We will develop and apply sound statistical inference methods to gene expression data. This aim deals with the confirmatory aspects of the statistical research, as opposed to the exploratory aspects. The analysis of gene expression data has not yet been put on a solid statistical footing with respect to extracting valid inferences within the context of an explicit statistical model. Techniques used to date have been mainly exploratory and descriptive. This project will carry out the necessary research on inference issues. It is anticipated that generalized linear models will play an important role. Relationships, patterns and characteristics of gene expression data are revealed more precisely when appropriate adjustments are made for covariates, markers and treatment indicators, generalized linear models provide a flexible framework for representing these adjustments. The research program will develop these models for gene expression data, taking their unique characteristics into account. It is also anticipated that a full inferential structure for gene expression data will require a Bayesian approach. This approach will be examined in the project.