About 12% of formula-fed infants in the United States are given soy formula. The phytoestrogen genistein is present in high amounts in soy formula; genistein binds to the estrogen receptor, with highest affinity for ER. Neonatal rodents exposed to genistein (at doses producing serum levels similar to soy-fed infants) show alteration in estrous cycle, subfertility/infertility, delayed vaginal opening, ovarian dysfunction and uterine adenocarcinoma, along with systemic effects including development of obesity and altered DNA methylation patterns. Soy formula use during infancy is associated with altered age at menarche, and risk of uterine fibroids and endometriosis. Using a unique set of samples from the NIEHS IFED study we tested the hypothesis that soy formula-fed infant girls have epigenetic alterations in vaginal tissue. As part of the IFED study, serial vaginal swab samples were being collected from birth to 9 months of age for histologic analysis of vaginal cells. We were able to obtain more than 200 samples from 28 soy formula-fed and 22 cow formula-fed girls. Illumina 450K methylation array analysis of bisulfite-modified DNA from vaginal cells of four soy formula-fed and 6 cow formula-fed girls suggested differences in methylation at three CpGs in the promoter region of the gene proline rich 5 like (PRR5L) (P < 10-4)8. The very small amount of DNA available from most of these samples were insufficient for Illumina array analysis; in order to examine methylation in the promoter of this gene, my laboratory designed a pyrosequencing assay that we demonstrate to be highly correlated with 450K results (R2=0.96). Using pyrosequencing, we found that all infant girls at birth have high methylation of the PRR5L promoter region, presumably due to high levels of circulating maternal estrogen. But while infant girls who are fed exclusively cow formula -- which, like mothers milk contains no estrogenic compounds -- have rapidly falling DNA methylation, samples from infant girls fed genistein-containing soy formula maintain significantly higher levels of DNA methylation at PRR5L. Using TCGA data we demonstrated that increasing methylation of the CpGs in PRR5L is associated with decreasing mRNA expression of the gene. In addition, we collaborated with Dr. Carmen Williams in the Reproductive and Developmental Biology Laboratory to show that expression of mouse Prr5l is lower in neonatal mice exposed to genistein. Our data provide the first example of epigenetic reprograming from xenoestrogen exposure in humans and may provide a useful model for studying plant estrogens along with other chemicals with hormonal activity (endocrine disrupters). The epidemiologic evidence for endocrine disruption remains tenuous, reflecting the challenge of connecting very low-dose exposures during infancy and childhood to health effects that may only become manifest years later in adolescents and adults. Although we do not believe that the epigenetic findings of our study constitute a clear contraindication for soy formula use, they do provide mechanism by which early-life exposure to endocrine disrupting compounds could lead to later-life health effects. BMI and blood methylation. Obesity is an established risk factor for type 2 diabetes, cardiovascular disease and other chronic diseases including breast and colon cancer. Excess adiposity contributes to disease risk through a variety of biological pathways, that may include epigenetic processes. We have investigated the association between BMI and DNA methylation in two studies: one using blood samples from the Sister Study, and a second smaller study done directly in normal breast tissue. For the Sister Study, we used as our primary discovery set 27K methylation data that we had generated on 871 white women as part of our case-cohort study of breast cancer1, and a replication set 450K methylation data that we had generated on women as part of our case-control study of DES7. In the 27K discovery set we identified CpGs in 4 genes (LGALS3BP RORC ANGPT4 and SOCS3) that were associated at genome-wide significance (FDR q < 0.05) with BMI, all of which were also significant after Bonferonni correction in the 450K replication set. We also used the 450K replication set to identify 20 additional sites (not represented on the 27K array) that were significant at genome wide significance (FDR q<0.05). From these 20, we selected five of the top CpG sites for pyrosequencing analysis in my laboratory to determine methylation status in participants from the discovery set; CpGs in all five genes (RPS6KA2, ABCG1, FSD2, STK39 and CRHR2) showed association after Bonferroni correction. Many of the CpGs we identified are in genes linked to obesity and obesity-related diseases. Animal models indicate that SOCS3 expression inhibits leptin signaling and is likely involved in the decreased leptin sensitivity observed in obese individuals; genetic polymorphisms in and near SOCS3 are associated with obesity in human population studies and decreased DNA methylation in blood at a CpG in the coding region of SOCS3 has been linked to risk of Type 2 diabetes. STK39 (Serine Threonine Kinase 39) plays a role in cellular responses to stress; polymorphisms in this gene have been linked to risk of hypertension. RORC (RAR-Related Orphan Receptor C) regulates production of the inflammatory cytokine IL-17 by T helper-17 (Th17) cells; Th17 cell activation and cytokine production appears to play a role in the pathogenesis of diabetes. The observed association between adult BMI and methylation at cg03218374 in the Angiopoetin-4 (ANGPT4) gene is consistent with an existing study that links methylation at this same CpG in cord blood to infant birth weight. Methylation arrays have enabled large-scale epigenome-wide studies at single CpG site resolution. The Illumina Infinium HumanMethylation450 BeadChip has been the most commonly used array, providing an estimate of methylation level at about half a million individual CpG sites, and its recently released successor, the MethylationEPIC array extends coverage to over 850,000 CpG sites. These arrays use probes with two different chemistries (Infinium I and Infinium II) and two fluorescent dyes (Cy3-green and Cy5-red) resulting in complex raw data that require pre-processing before use. The need for pre-processing may not be obvious: For example, duplicate arrays from the same individual may have raw methylation values with very high correlation (R=0.996), suggesting that unprocessed data quality is good. What can be underappreciated is that arrays from two different individuals have raw methylation values with almost the same high correlation (R=.992). In order to assess and improve data quality in our epidemiologic studies of methylation, we have routinely analyzed duplicate samples and laboratory control samples with known methylation. In so doing we have noted that although a variety of pre-processing methods have been published for background correction, probe-type bias correction, and dye-bias correction, these existing methods had theoretical or practical shortcomings. This led us to develop and publish a series of four papers providing improvements in each of the three pre-processing steps, plus a method to estimate 5-hydroxymethylcytosine. Each new method has been progressively added to our original R software package called ENmix (named after our first published method13), that is freely available on the Bioconductor web site where it ranks in the top 20% of downloaded software with more than 2,200 downloads in 2016: (https://bioconductor.org/packages/release/bioc/html/ENmix.html).