Current technology advances have brought us massive biomedical data for statistical analysis, for example, the cancer microarray data. Typical of these data is the common feature that the number of observed samples is much smaller than the number of variables/predictors, which poses challenges for statistical analysis. Identifying differentially expressed genes and predicting sample phenotype based on the gene expressions data are two important research questions in analyzing these large-scale biomedical data. This project proposes to develop some new large-scale prediction and signifiance analysis statistical methods that are specially designed to address small sample size and potential sampe heterogeneity issues, incorporate existing biological information for improved inference, and can be applied very generally. The usefulness of these methods will be shown with the large-scale biomedical data originating from the leukemia cancer research projects. The cancer projects aimed to improve the cancer molecular diagnosis and prognosis by identifying molecular biomarkers for critical early treatment and rapid, noninvasive testing. The specific aims are 1) Develop new statistical methods for significance testing of large-scale molecular markers. 2) Develop new statistical methods that appropriately model the sample heterogeneity for significance testing. 3) Develop new statistical methods that utilize the gene group information to improve cancer prediction. 4) Use the developed models and methods to answer research questions relevant to public health in the leukemia cancer projects;and implement and validate the proposed methods in user-friendly and well-documented software, and distribute them to the scientific community at no charge. Project