PROJECT SUMMARY In modern biomedical studies it has become commonplace to collect high-dimensional data, and hence dimen- sionality reduction tools are of critical importance and are routinely used. Some of the most common include clustering and factor analysis. The basic tenet behind dimensionality reduction is that we can replace a high dimensional set of variables by some low-dimensional summary. This is certainly necessary to make sense of complex data and also overcome problems with high-dimensional, low sample size data. However, a critical is- sue that has not been adequately studied is reproducibility. Standard approaches for dimension reduction can be very sensitive to choice of tuning parameters and arbitrary choices (e.g., choice of kernel or distance meas- ure). This leads to a lack of robustness, with potentially very different results being produced when data are slightly perturbed. This lack of robustness tends to be compounded as the size of the data increases - both in terms of the sample size and number of variables collected. Also, a critical issue is lack of generalizability. In particular, dimensionality reduction for a particular group of individuals may fundamentally lack generalizability to other groups of individuals. This creates major problems in interpretation of results. Motivated in particular by environmental epidemiology studies collecting exposome data and by nutritional epidemiology, this project proposes to develop fundamentally new methods for improving robustness and reproducibility of di- mensionality reduction through the following specific aims. (1) Develop robust methods of factor analysis designed to limit sensitivity to arbitrary assumptions and size of the data. (2) Develop robust methods of model-based clustering designed to limit sensitivity to arbitrary assump- tions and size of the data. (3) Develop novel methods for robust clustering from multivariate and grouped data designed to avoid typical pitfalls of mixture models with increasing p. (4) Develop robust consensus methods that estimate low dimensional summaries that best reflect struc- ture across subpopulations. (5) Apply the proposed methods to data from key epidemiologic cohorts that have measured a wide va- riety of environmental, behavioral, and biological exposures and provide a general use software package for implementation. This package is designed to be easily used and accommodate a broad variety of data types, further aiding reproducibility and transparency.