Nonparametric Variable Selection and Dimension Reduction for Predictive Models of Clinical Response in Pharmacogenomics Research Whole genome gene expression information have been used in pharmacogenomics research to correlate patients' gene expression profiles with a drug's efficacy. For many complex diseases, e.g., cancers, it is anticipated that gene expression profiles will provide predictive models, more precise than those based on standard clinical features, to define patient-specific treatment strategies. However, finding gene expression variations that affect drug response is complicated and challenging. Computational difficulties include that the whole genome gene expression data are high dimensional and their relationships to drug response would be nonlinear. Therefore, one can no longer rely on existing statistical and computational methods to adequately analyze the data. The long-term objective of the proposed project is to develop statistical and computational methods (for analysis of high dimensional but low sample size data and apply the methods in pharmacogenomics research. The short-term objective is to specifically develop nonparametric variable selection and dimension reduction techniques for predictive models of clinical response on gene expression data.) Three specific aims will be pursued: 1) Develop nonparametric vari- able selection approaches using LOESS (locally weighted scatterplot smoothing), which does not assume linear or any other specific forms of predictive models for clinical response; 2) Ex- tend Sliced Inverse Regression (SIR) to dimension reduction problems when the dimension is much larger than the sample size, as the case in pharmacogenomics; 3) Apply the proposed methods in pharmacogenomics (studies, whose data are available in Gene Expression Omnibus (GEO) DataSets, ://www.ncbi.nlm.nih.gov/gds . The proposed variable selection and dimension reduction methods are general to other regression problems, when the regression functions do not have specific forms and the data are big in terms of very high dimensional predictors but relatively low sample size.) Software to implement analysis will use the statistical package R language and will be fully documented for easy use by the biomedical research community.