We will address the issue of assessing plasma biospecimen integrity and its effect on downstream analyses and outcome measures such as risk prediction. We hypothesize that identifying and enabling correction of post-draw, pre-analytical variability at both the sample and variable level should engender increased signal to noise and thus increased power and precision of many biomarker studies being performed on samples in well-characterized human populations. Our approach will give us the ability to leverage the resources (data, expertise, platforms) established in six past and ongoing NIH- funded biochemical profiling studies to bear on this problem at comparatively lower cost. We already have candidate markers for red cell contamination as well as for sample degradation due to delayed processing as is inherent in studies using central processing. These approaches are very closely related to those needed to address the remaining two pieces, on white blood cell and platelet contamination, and we can expand the quantitation approaches that yielded our initial markers by ~10-fold in coverage due to recent software advances. The Aims: Aim 1: To utilize established metabolomics, proteomics, and informatics approaches to identify biomarkers of sample degradation, and, separately, biomarkers of contamination of plasma with constituents of red blood cells, white blood cells, or platelets. Aim 2: To mathematically re-analyze pre-existing metabolomics (HPLC-Coularray-based) and proteomics data (UPLC-LTQ-Orbitrap-based) from samples nested within the Nurses' Health Study (NHS) and within the OMNIHEART and CALERIE Clinical Trials so as: (i) to cross-validate the metabolomics and proteomics data to determine inter-assay agreement concerning sample quality, and; (ii) to assess the extent and distribution of sample degradation and distribution of levels of contamination of plasma with constituents of red blood cells, white blood cells, or platelets across these studies Aim 3: To determine the effect of variable and/or observation exclusion for cause (ie, red blood cell contamination) with incremented cut-points (eg, ...0.01%, 0.03%, 0.1%, 0.3%... etc) on the results of a study of pre-identified biomarkers of caloric intake to predict rik for breast cancer and type II diabetes (750/1000 paired case-control pairs nested within NHS), and thus to determine the influence of biospecimen integrity directly on both exposure classification (ie, diet) and end product prediction. Aim 4: To structurally identify metabolomics and proteomics biomarkers markers of interest to enable marker propagation to other groups and to electronically publish biomarker signatures. PUBLIC HEALTH RELEVANCE: Retrospective analysis of plasma and sera repositories is central to many epidemiological investigations and thus to our knowledge about major public health issues such as the links between diet and disease. Quality issues, such as sample contamination and alteration during processing and degradation during shipping can affect assay outcomes, but are poorly understood and needed systematic markers are lacking. We will leverage our existing plasma datasets to identify such biomarkers/profiles so as to reduce artifact associated noise and increase the accuracy, power and precision of the many biomarker studies now being contemplated and performed on these samples.