The goal of this proposal is to develop improved methods for statistical inference from data arising in genomic studies, specifically from microarray platforms. Statistical algorithms, particularly those based on Markov chain Monte Carlo (MCMC) have become widely used in data analysis in all fields. In applications to genomic studies they have become particularly prevalent, in part due to the enormous amount of data collected and their ability to handle complex models. We address three specific aims: Specific Aim 1: Develop missing data methods applicable to SNP association genetics. In this process, where one is looking to associate a quantitative trait with SNPs, it is typical to get information on a large number of SNPs. As the information is typically not complete, we must deal with missing data, which causes two difficulties: (i) Accurate modeling must take into account the SNP correlation structure, which causes problems for standard missing data methods, and (ii) The large number of SNPs brings along computational and statistical problems. We are developing a Gibbs sampler that shows great promise in allowing efficient estimation of SNP effects in these problems. Specific Aim 2: Clustering and classification methods for time-course microarray data. We continue our development of clustering methods for time-course data based on Bayesian hierarchical models and Metropolis-Hastings search algorithm with the specific goal of developing a new classifier that associates clusters, or gene patterns, with clinical outcomes. Specific Aim 3: Testing for the existence of clusters. Although there are many methods for clustering data, there are few methods for assessing whether the clusters are significant. We propose a Bayesian model selection methodology to derive a test for the existence of clusters. As many phenotypes show quantitative variation, detection of clusters is a preliminary step that would suggest further genomic analysis to determine the existence SNPs controlling the observes quantitative traits. PUBLIC HEALTH RELEVANCE: The methods that will be developed are motivated by a number of studies that promise to have impact on disease management. In particular, we look to apply our missing data methods to a SNP discovery data set from lupus patients to find associations between SNPs and disease status, and our gene-based classifier can aid physicians in managing the treatment of trauma patients. The proposed cluster test can provide a screening tool to identify data with possible genetic associations, again leading to information on genetic associations.