This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. The selection of relevant predictor variables in regression is crucial to have an accurate predictive model. DNA microarray data consists of thousands of genes as predictor variables that greatly exceed the number of observations in an experiment. It is advantageous to group highly correlated variables to simplify and improve the predictive model. The overall goal of this research is to compare three techniques used to find the optimal grouping of related genes for regression: hierarchical clustering, lasso, and elastic net. In addition, two outcome predictive models will be compared. Modeling and predicting changes at the DNA level will assist in understanding the development of complex diseases. Reducing the number of variables in the model will allow for an easier interpretation of results.