Methods for Genetic Epidemiology Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error than currently used methods. Procedures have been developed for case-control genome-wide association studies (GWASs) that select the SNPs whose chi-square trend tests are largest (or whose corresponding p-values are smallest). These analyses showed that large samples are needed to have a high detection probability (the chance a true disease SNP appears in the top ranks of chi-square values). These methods have been extended to the two-stage design. Results suggest that the first stage must be large enough that a disease-associated SNP will have a large chance of having among the highest ranking chi-square values at the end of the first stage. Otherwise, the detection probability will remain small, regardless of how many cases and controls are studied in the second stage or subsequent stages. A study has been conducted to assess how much the discriminatory accuracy could be improved by adding seven SNPs, which have been shown to be associated with modestly increased breast cancer risk, to the NCI"s Breast Cancer Risk Assessment Tool (BCRAT), which is based on a short questionnaire. These SNPs add only slightly to the discriminatory power of BCRAT. Calculations showed that hundreds of such SNPs would be needed to attain high discriminatory accuracy. Very efficient analyses of genetic effects and gene-environment interactions are possible from case-control studies under the assumption of Hardy-Weinberg-Equilibrium (HWE) and gene-environment independence in the underlying source population. Such analyses, however, can be misleading when these assumptions fail. Empirical-Bayes type shrinkage estimation methodologies have been developed as a robust approach to analysis of genetic case-control data. These methods can adaptively shrink the analysis towards distributions under HWE and G-E independence, but only to the extent the data justify the assumption. BRCAPRO is a statistical model that predicts a woman"s chance of carrying a deleterious mutation in the BRCA genes based on her family history of breast and ovarian cancers. Family history is defined as the age of diagnosis of each disease or current age or age of death for each relative. BRCAPRO has been extended to account for medical interventions (like oophorectomy) amongst relatives and also to account for BRCA-related cancers other than breast or ovarian by developing a rigorous foundation for handling multiple diseases with censoring. It is common clinical practice to counsel women from BRCA+ families with a history of breast cancer, but who personally test BRCA-, that their family history does not predispose them to higher risk of breast cancer than anyone in the general population. Data from the Washington Ashkenazi Study (WAS) has been analyzed to show that there remains a residual breast cancer risk, even amongst the BRCA-, of 1.5-fold per 1st-degree relative (FDR) with breast cancer. These results are being followed up with more sophisticated analysis to refine the residual risk estimates. Models for Relative Risks of Environmental Exposures A three-parameter excess relative risk model in pack-years and intensity has been previously shown to quantify the leveling off smoking related risk of lung and bladder cancer above 15-20 cigarettes per day. These analyses have been extended to examine intensity patterns for incident bladder, esophagus, kidney, larynx, liver, lung, oropharynx, and pancreas cancers by using data from a single prospective cohort in Finland, the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study. At more than 10 cigarettes per day, an "inverse exposure rate" pattern has been found for each cancer site. After adjustment for pack-years, intensity effects were quantitatively homogeneous across the diverse cancer sites and homogeneous with intensity effects from the prior analysis of multiple studies. Consistency of intensity patterns suggested a general phenomenon and may provide clues to the molecular basis of smoking-related cancer risk. Poisson regression model has been used to study the shape of the relationship between respiratory cancer mortality and cumulative inhaled arsenic exposure among copper smelter workers. Results suggested a direct concentration effect from inhaled inorganic arsenic, whereby the excess relative risk for a fixed cumulative exposure was greater when delivered at a higher concentration and shorter duration than when delivered at a lower concentration and longer duration. Exposure Assessment, Errors in Exposure Measurements, and Missing Exposure Data For most diseases, single biomarkers do not have adequate sensitivity or specificity for practical purposes. An approach has been developed to combine several biomarkers into a composite marker score without assuming a model for the distribution of the predictors. Using sufficient dimension reduction techniques, the original markers are replaced with a lower-dimensional version, obtained through linear transformations of markers that contain sufficient information for regression of the predictors on the outcome. The performance of this score is assessed by the area under the receiver-operator characteristics curve (ROC), a popular summary measure of the discriminatory ability of a single continuous diagnostic marker for binary disease outcomes. An asymptotic chi-squared test for assessing individual biomarker contribution to the diagnostic score is also derived. For many diseases, it is difficult or impossible to establish a definitive diagnosis because a perfect "gold standard" may not exist or may be too costly to obtain. Methods have been proposed to use continuous test results to estimate prevalence of disease in a given population and to estimate the effects of factors that may influence prevalence. Motivated by a study of human herpesvirus 8 among children with sickle-cell anemia in Uganda, where 2 enzyme immunoassays were used to assess infection status, a 2-component multivariate mixture models have been fitted. The compoenent densities are modeled using parametric densities that include data transformation as well as flexible transformed models. In addition, the mixing proportion, the probability of a latent variable corresponding to the true unknown infection status, is modeled via a logistic regression to incorporate covariates. The model performance is assessed in simulations and results are presented from applying various parameterizations of the model to the Ugandan study.