SUMMARY-PROJECT 6 The primary objective of the Conte Center is to better understand the causes of suicidal behavior and, ultimately, to translate that knowledge into tools that have the potential for detecting patients at higher risk for more lethal suicide attempts. Towards that aim, a great deal of valuable data is being collected. This project is primarily concerned with very high-dimensional data, including brain imaging data, DNA genetic and epigenetic data, and gene expression data. In particular, we are interested in using such data as predictors in statistical models. Our ultimate goal is to build and implement models that can be used to distinguish depressed suicide attempters from other depressed patients. The greatest hurdle is dealing with very high dimensional data, a situation that classical statistical modeling techniques are not equipped to handle. Our approach to modeling with such high-dimensional data seeks to achieve three primary goals: (1) to provide good predictive accuracy; (2) to build interpretable models that can give meaningful insight into the relationship between the various predictors and the outcome variable; (3) to ensure stability of the models through validation studies. To accomplish these goals our approach will incorporate a regularization component in the model selection and fitting process by applying a penalty to model complexity. In applications, the amount of penalization will be determined by cross-validation or another related technique. In both model development and application, we will consider a wide range of outcome variables including continuous or nearly continuous (e.g., severity of depression, aggressive traits), ordinal (e.g., lethality of suicide attempts), categorical (e.g., diagnosis), and binary (yes/no, e.g., suicide attempt), as well as time-to-event data (e.g., time to a suicide attempt), which will draw on survival analysis techniques. This project focuses specifically on situations in which the number of possible predictors is in the tens of thousands or more. Examples of existing data and data to be gathered as part of the Conte Center include brain imaging data (positron emission tomography (PET) images of the serotonin system, diffusion tensor imaging (DTI) scans, functional magnetic resonance imaging (fMRI) scans, magnetic resonance spectroscopy (MRS) scans), genome-wide DNA genotyping and methylation data, and gene expression data.