Pooling is a method which combines multiple individual biospecimens which are measured as a single unit to reduce cost, improve analytic feasibility, and/or improve statistical efficiency. Previously showing that pooling allowed for a highly accurate estimation of the mean, random sampling provided a more efficient estimate of the variance leading to our development of a cost-efficient hybrid design that involves taking a sample of both pooled and unpooled data in an optimal proportion to efficiently estimate the unknown parameters of the biomarker distribution. Continuing to develop methods in discrimination analysis with pooled biomarkers, we developed methods to estimate covariate-adjusted measures of diagnostic accuracy based on pooled biomarker assessments. However, our focus has evolved to developing methods in regression with either a pooled exposure or a pooled outcome. For exposure measurements, we not only developed methods for pools formed stratified by a covariate or outcome but also for pools formed independent of other variables, which could be of mixed outcome status. The latter may be highly impactful since it relaxes the need for stratified pools common to current methods and allows for secondary analysis of pooled biomarkers after an optimal design to pool dependently on a primary outcome has been employed. Furthermore, we developed methods for normally and gamma distributed exposures and outcomes allowing for skewed biomarkers in both roles. Notably, these techniques maintain the flexibility to allow for a hybrid pooled-unpooled design. Progress was also made in using pooled samples in efficient designs. A case-only design for estimation of gene-environment interactions related to disease, investigations which are prone to low statistical power due to the need for a sufficient number of individuals on each level of disease, gene, and environment. Specifically, a case-only design was proposed, assuming gene-environment interaction independence in controls for a rare disease. However, the gene-environment interaction independence assumption is not always strictly met; therefore, to maintain the increased efficiency of the CO estimator while being more robust to departures of this assumption, modifications to the traditional case-control estimator were proposed using a two-stage estimator and an empirical Bayes-type shrinkage estimator. We also developed an efficient design strategy for logistic regression using outcome- and covariate-dependent pooling of biospecimens prior to assay as a complement to our methods for pooled samples of mixed outcome or covariates. Regarding biologically informed innovations in biostatics for epidemiology, we continued to bring together laboratory science and epidemiology. Under this umbrella two collaborative efforts, funded by a competitive external grant from the American Chemistry Council, were created to bridge biostatistics and etiologic research. The first series of papers explored the current state-of-the-art statistical methods for handling missing data and promoting pragmatic principled parametric (e.g., multiple imputation) and semi-parametric (e.g., inverse probability weighting) techniques, arguing the importance of principled missing data methods is equal to that of adjusting for confounding, and that the use of such methods should be similarly prevalent in etiologic research. The second series of papers was motivated by interest in outcome dependent sampling designs as fiscally and statistically efficient designs. Dependent sampling designs enrich a cohort based on an exposure or outcome of interest, thus collecting data on the most informative individuals. These designs are accompanied by analysis techniques which account for the enrichment and provide proper inference. As the current literature was based on statistical idealization, our group sought to broaden the appeal of these methods by honing the designs for specific epidemiologic application. For example, a cluster-stratified case-control outcome dependent design was developed motivated by the practical need to sample patients within clinics rather than across clinics. In all the papers, operating characteristics as well as potential trade-offs of a standard design were provided as practical guidance and motivation for these highly efficient designs. More focused work on novel methods specific to reproductive and perinatal epidemiology has led to several valuable contributions. Building on previous longitudinal methodology on menstrual cycle and pregnancy, we addressed issues surrounding timing of measuring maternal and fetal weight gain during pregnancy which are time-dependent exposures in some analysis and outcomes of interest in others. Specifically, a regression-based adjustment for gestational age was described which produces unbiased estimates for the association between maternal gestational weight gain and neonatal mortality risk. Similarly, a time-to-delivery approach was also developed to assess the relationship of maternal weight gain and preterm birth through a survival framework, illustrating how several strategically timed measurements can yield unbiased risk estimates where a nave analysis fails to mitigate bias. Lastly, with the recent developments of causal inference, especially the utilization of directed acyclic graphs (DAGs), many common terms in epidemiology, such as confounding, selection bias, and measurement error, have been more precisely defined. However, concepts such as overadjustment, specific cases of selection bias, and collinearity remained unexplored. Using DAGs, we previously redefined overadjustment bias and truncation in terms of DAGs. Collinearity is another loosely defined term with broad impact in epidemiologic studies, where convention is to delete or combine variables that are highly correlated (i.e. collinear). We succinctly showed the consequences of collinearity in linear and logistic regression in three fundamental causal scenarios: intermediates, confounders, and colliders. Through closed form solutions and simulation results for linear and logistic regression, respectively, bias and variance of total effect estimates challenged the dogma of variable reduction and instead advocated for a focus on a properly specified model where unbiased results can be achieved even with near perfect correlation between the exposure and a given intermediate, confounder, or collider. With an increased utilization of multiplex assays, interest in the exposome, and concerns over environmental chemical mixtures, these important findings highlight a critical need to consider the causal framework rather than deleting variables for statistical convenience. We will continue to generate new methodologies that are born of real world problems and that are cost efficient and statistically principled while incorporating knowledge of the etiologic and measurement processes underlying most biomarkers.