Methods for Genetic Epidemilogy We derived formulas to calculate sample sizes needed to detect gene-environment interactions and developed a computer program, POWER, for distribution to epidemiologists to facilitate the design of such studies. We are continuing work to produce a Windows version of POWER. We studied the strengths and weaknesses of the kin-cohort design for estimating the penetrance of an autosomal dominant gene, and we developed a technique for kin-cohort data to detect residual familial correlation among phenotypes after accounting for correlations due to an autosomal dominant gene. We developed marginal methods of analysis for kin-cohort data that are robust to such residual familial correlations. We also developed bivariate cure models to study survival data from pairs of members of randomly selected families. We developed methods to evaluate risks from environmental factors in families selected for genetic studies to have two or more diseased members. These methods take ascertainment and genetic correlations into account and avoid biases from conventional analyses that ignore these features. An empirical study indicated that case-control assessments of the association of a candidate gene with disease status are unlikely to be confounded by population stratification, namely by association with a subpopulation that is also highly susceptible to the disease. We also reviewed the potential role of classical case-control and cohort designs for conducting studies of genetic factors. Two projects were completed to increase the power of concordant or discordant sib pair designs for detecting genetic linkage. One test weighs more severely discordant pairs more heavily. Another procedure maximizes the minimum power over a set of tests designed to detect different plausible alternatives. Work began on statistical methods for analyzing pooled DNA samples. Previous work has shown this approach to be efficient, compared to unpooled designs, for estimating prevalence and identifying individuals with a particular rare allele. The present work extends these methods to the estimation of the joint prevalence of two or more alleles. Joint prevalences have application to estimating risks from joint exposures and to estimating the population disequilibrium coefficient. Methods for Design and Analysis of Case-Control and Cohort StudiesWe developed pseudolikelihood methods to analyze population-based two-stage case-control data with cluster sampling of controls. These methods are beng used to estimate the relative risk and absolute risk of non-melanoma skin cancer from ultraviolet radiation and from host factors. Analytical methods were developed for case-control studies with supplemental samples of controls who are exposed to one of the factors under study. This research was motivated by a study of the joint risks of hepatocellular carcinoma from hepatitis B and C viruses. Because very few controls had both exposures, it was useful to supplement the control sample by studying hepatitis C prevalence in a previously identified hepatitis B-positive population of cancer-free subjects. We described procedures for estimating variances for relative risk estimates from the case-cohort design and proposed adaptations to handle missing covariates. We developed an efficient cohort sampling design to validate disease outcome status in post-marketing surveillance studies from large data bases to detect adverse effects of drugs. Exposure Assessment, Errors in Exposure Measurements, and Missing Exposure Data We developed a "sliding time window" procedure to determine what portion of a time-varying exposure most influences current risk of disease. We also developed a method based on splines to estimate the contribution to current cancer risk of various portions of the previous exposure history. In an appication to case-control data on lung cancer in Germany, cigarette smoking within the previous two to ten years was most predictive of current lung cancer risk. The spline weight function approach is being compared to alternative bilinear weighting methods in the analysis of lung cancer data from the Colorado Uranium Plateau Miners Study.We investigated imputation methods for missing exposure data and found, contrary to published reports, that is preferable to use population mean values rather than control mean values in the imputation. The differences are small, however, if the disease is rare and relative risks are small. A related study on imputation of missing data in studies of residential exposure to electromagnetic radiation and to radon demonstrated that control mean imputation may be biased if missingingness is not at random.A detailed investigation of the impact of measurement error of the dose of head and neck irradiation on estimated risks of thyroid cancer took into account Berkson-type errors from the use of phantom "external prediction data", classical error from the use of regressions to predict dose, and missing data. This analysis changed estimates of the role of age at irradiation somewhat, but there was little change in the estimate of excess relative risk per unit dose of radiation. We showed that a procedure of Hui and Walter to estimate prevalences, sensitivity, and specificity, in the absence of a gold standard, from repeated measurements in two populations with differing prevalences, is robust to violations of the assumption of common error rates.Estimates of the effects on risk of lung cancer from environmental tobacco smoke(ETS) can be obtained using a variety of methodologies including: extrapolations based on descriptive models developed in active smokers; extrapolations based on the Armitage-Doll multistage model for carcinogenesis or the 2-stage clonal expansion model applied to active smokers; and descriptive models of lung cancer and ETS exposure. We showed that the various modeling approaches gave risk estimates which were broadly similar to each other, but somewhat higher than estimates based on a comparison of urinary cotinine levels in active smokers and ETS exposed individuals.Other WorkWe developed meta-analytic methods to analyze data on surrogate markers to estimate the effect of treatment on a true clinical endpoint. The procedure relies on previous similar studies with measurement of both the surrogate endpoints and true clinical endpoints to develop information on the relationships among treatment, surrogate outcomes and true clinical outcomes. Taking into account the uncertainity in estimates of between-study covariances reveals that estimates of treatment effect from surrogate marker studies are much less precise than estimates based on clinical endpoints.We used mixture models to analyze the sensitivity and specificty of IgG antibody tests for Helicobacter pylori. This method has the advantage that it does not rely on a cut-point derived from possibly inappropriate external data. The method allows one to predict the probability that a person is infected with Helicobacter pylori as a function of IgG titre and other covariates.One investigator developed a suite of MATLAB programs that facilitate the use of this language for sophisticated statistical and epidemiological analyses.