b>Methods for Genetic Epidemiology</b;br>Case-control genome-wide association studies provide a vast amount of genetic information that may be used to investigate secondary phenotypes. Methods have been developed to analyze the association between genetic markers and a secondary phenotypes in a case-control studies accounting for potential bias due to use of cases when there is an interaction between the secondary phenotype and the genetic marker on the risk of the primary disease. The proposed method adaptively combines the case and control data, while reducing to the controls only analysis if there is strong evidence of an interaction. Simulations and asymptotic theory indicate that the adaptively weighted method can reduce the mean square error for estimation with a pre-specified SNP and increase the power to discover a new association in a genome-wide study, compared to an analysis of controls only.<br><br>Large two-stage genome-wide association studies (GWAS) have been shown to reduce required genotyping with little loss of power, compared to a one-stage design, provided a substantial fraction of cases and controls, pi (sample), is included in stage 1. A new measure of power, the detection probability (DP), has been defined as the probability that a given disease-associated single-nucleotide polymorphism (SNP) will have a p-value among the lowest ranks of p-values at stage 1, and, among those SNPs selected at stage 1, at stage 2. Our results suggest that multistage designs with small first stages (e.g., pi (sample) <= 0.25) should be avoided, and that additional genotyping in earlier studies with small first stages will yield previously unselected disease-associated SNPs.<br><br>It is increasingly recognized that pathway analysesa joint test of association between the outcome and a group of SNPs within a biological pathwaycould potentially complement single-SNP analysis and provide additional insights for the genetic architecture of complex diseases. A class of highly flexible pathway analysis approaches has been proposed based on an adaptive rank truncated product (ARTP) statistic that can effectively combine evidence of associations over different SNPs and genes within a pathway. Computationally-efficient permutation algorithms are developed for evaluating the statistical significance of the proposed test-statistics. The methods are applied to a study of the association between the nicotinic receptor pathway and cigarette smoking behaviors.<br><br>DCEG has recently conducted genotyping of NHANES-III household members, inaugurating the field of truly US population-based genetic research. However, reported family relationships within NHANES-III households based on questionnaire data are incomplete and inconclusive with regards to actual biological relatedness of family members. Statistical algorithms have been developed for using DNA fingerprints (the STRs in the Identifiler assay) to infer family relationships within NHANES-III households. The findings from this study will be pivotal for future family studies and GWAS within NHANES.<br><br><br><b>Models for Relative Risks of Environmental Exposures</b;br>Complex diseases, like cancer, can often be classified into subtypes using various pathological and molecular traits of the disease. Methods have been developed for analysis of disease incidence in cohort studies incorporating data on multiple disease traits using a two-stage, semi-parametric Cox proportional hazard regression model that allows one to examine the heterogeneity in the effect of the covariates by the levels of the different disease traits. For inference in the presence of missing disease traits, we propose a generalization of an estimating-equation approach for handling missing cause of failure in competing-risk data. The methods are illustrated using simulation study and a real data application involving the Cancer Prevention Study (CPS-II) nutrition cohort.<br><br>For most diseases, single biomarkers do not have adequate sensitivity or specificity for practical purposes. An approach has been developed to combine several biomarkers into a composite marker score without assuming a model for the distribution of the predictors. Using sufficient dimension reduction techniques, the original markers are replaced with a lower-dimensional version, obtained through linear transformations of markers that contain sufficient information for regression of the predictors on the outcome. The linear transformations are combined using their asymptotic properties into a scalar diagnostic score via the likelihood ratio statistic. The performance of this score is assessed by the area under the receiver-operator characteristics curve (ROC), a popular summary measure of the discriminatory ability of a single continuous diagnostic marker for binary disease outcomes. An asymptotic chi-squared test for assessing individual biomarker contribution to the diagnostic score is also derived.<br><br><b>Exposure Assessment, Errors in Exposure Measurements, and Missing Exposure Data</b;br>In epidemiologic studies, partial questionnaire design (PQD) can reduce cost, time and other practical burdens associated with lengthy questionnaire by assigning different subsets of the questionnaire to different, but overlapping, subsets of the study participants. Methods for semi-parametric inference for regression model under PQD and other study settings that can generate non-monotone missing data in covariates have been developed. The methods are illustrated using data from a case-control study of non-Hodgkin's lymphoma where the data on the main chemical exposures of interest are collected using two different instruments on two different, but overlapping, subsets of the participants.<br><br>For many diseases, it is difficult or impossible to establish a definitive diagnosis because a perfect "gold standard" may not exist or may be too costly to obtain. A method is proposed to use continuous test results to estimate prevalence of disease in a given population and to estimate the effects of factors that may influence prevalence. Motivated by a study of human herpes virus 8 among children with sickle-cell anemia in Uganda, where 2 enzyme immunoassays were used to assess infection status, a 2-component multivariate mixture model is fitted. The component densities are modeled using parametric densities that include data transformation as well as flexible transformed models. In addition, the mixing proportion, the probability of a latent variable corresponding to the true unknown infection status, is modeled via a logistic regression to incorporate covariates. This model includes mixtures of multivariate normal densities as a special case and is able to accommodate unusual shapes and skewness in the data. The model performance is assessed in simulations and results are presented from application of the methods to the Ugandan study.