This proposal involves the development and evaluation of novel statistical methods for the design and analysis of high dimensional biological studies. By 'high dimensional' biology, we refer to studies in which the number of variables being studied is large, often in the thousands. This high dimensionality can involve any combination of thousands of genetic polymorphisms, gene expression levels, protein measurements, or exposure variables as in the case of nutrient or toxicology studies. Such high dimensional studies pose unique challenges, because they are often coupled with small sample sizes. High dimensional biology potentially allows biologists to make major leaps forward by increasing the number of factors they can examine and, importantly, offering the opportunity to study the complex relations and coordination among entire genomes, proteomes, and metabolomes rather than single factors in isolation. For these technologies to achieve their potential, a richer understanding of the statistical properties of the data produced and statistically sound methods for analyzing such high dimensional data must be developed and made available to biologists. Writings on analysis of high dimension biological data analysis have been passing through overlapping phases of increasing sophistication. Taking the microarray field as prototypic, earliest writings consisted almost exclusively of simple descriptive statistics and the use of a variety of data display techniques without rigorous statistical properties. This was followed by creative and intuitively appealing procedures developed by computationally sophisticated bioinformaticians, but the methods often had no strong statistical basis. A third emerging phase recognizes that the data produced by microarrays is not inherently different than other measurements of random variables. Though new procedures tailored to the nuances that microarray, proteomic and other high dimensional data are called for, they must be built on the same sound foundation of probability and statistical principles that apply to other types of analysis. We propose herein to develop and evaluate a variety of novel statistical methods to meet this challenge with the highest standards of rigor. These methods will then be available to other project investigators within the proposed center and in the field as a whole.