The analysis of individual parts of the genome enables a more comprehensive understanding of how the parts fit together in the broader context of disease. The following projects are described with respect to continuation of previous years' projects and represent integrative analyses of independent genomic data types to address the genome as a complex regulatory system. The analysis areas include cis-acting, trans-acting and epigenetic regulators of the human genome Addressing the evolution of the human genome through the emergence of new human-specific genes regulated by bidirectional promoters. Research from my group previously established the enrichment of bidirectional promoters in vertebrate genomes including human, mouse, rat, and cow (Yang et al, 2008), which indicates evolutionary selection to maintain their presence. Despite the cross-species similarities, we discovered that some bidirectional promoters correspond to positions of unidirectional promoters in other vertebrate species; leading to the hypothesis that species-specific bidirectional promoters greatly and uniquely target the detection of species-specific transcripts in any genome. We confirmed this hypothesis while participating in the Bovine Genome Consortium (Bovine Sequencing Consortium et al. 2009) and identified a spliced, highly expressed, multi-exon (noncoding) transcript regulated by a bidirectional promoter that was exclusive to the bovine lineage (Piontkivska et al. 2009). To find human-specific transcripts, my group identified a set of 1,400 nonconserved, novel noncoding transcripts flanking bidirectional promoters (Gotea et al. 2013, PLosONE). Once identified, we tested the transcripts for signs of positive selection, as an indicator of beneficial function to the human genome (manuscript submitted). After integrating computational and experimental data, we found nucleotide substitutions that facilitate the emergence of new exons in those genes. The gene list provides the basis for understanding novel transcripts that are present only in the human genome. Moreover, using this approach, novel transcripts can be identified that are unique to any species. Postscript: The model for the emergence of new noncoding genes through bidirectional promoters is consistent with recent reports showing that the majority of lincRNA genes have bidirectional promoters, whereby uncharacterized noncoding RNA are positioned opposite the lincRNA. Comparing genome-wide methylation patterns in subtypes of ovarian tumors and mouse models. I am testing the hypothesis that altered DNA methylation in promoter regions can distinguish genes that are relevant to ovarian tumor pathology. Given the sporadic nature of 90% of ovarian cancers, disruption of normal gene regulation is a likely contributor to disease etiology. Methylation patterns at 25,475 unique loci in 43 samples of ovarian, endometrial or metastatic tumors, along with normal fallopian tube and normal endometrium have been assessed. Data from this project showed that methylation patterns mirror histopathological subdivisions of ovarian tumors and discriminated tumor types with finer granularity and greater reproducibility than published gene expression assays (Kolbe at al. 2012). The extensive differences we showed between tumor and normal samples are the first report of a methylator phenotype in ovarian endometrioid tumors, analogous to the methylator phenotype identified in colorectal cancer and glioblastoma. Ongoing studies will be to look for biomarkers for use in diagnostic tests. Profiling common epigenetic features in solid human epithelial tumors The study of aberrant DNA methylation in cancer holds the key to the discovery of novel biological markers for diagnostics and can help to delineate important mechanisms of disease. We have identified 12 loci that are differentially methylated in serous ovarian cancers and endometrioid ovarian and endometrial cancers with respect to normal controls. The strongest signal showed hypermethylation in tumors at a CpG island within the ZNF154 promoter. We show that hypermethylation of this locus is recurrent across solid human epithelial tumor samples for 15 of 16 distinct cancer types from TCGA. Furthermore, ZNF154 hypermethylation is strikingly present across a diverse panel of ENCODE cell lines, but only in those derived from tumor cells. By extending our analysis from the Illumina 27K Infinium platform to the 450K platform, to PCR amplification of bisulfite treated DNA, we demonstrate that hypermethylation extends across the breadth of the ZNF154 CpG island. We have also identified recurrent hypomethylation in two genomic regions associated with CASP8 and VHL. These three genes exhibit significant negative correlation between methylation and gene expression across many cancer types, as well as patterns of DNaseI hypersensitivity and histone marks that reflect different chromatin accessibility in cancer vs. normal cell lines. Our findings emphasize hypermethylation of ZNF154 as a biological marker of relevance for tumor identification. Epigenetic modifications affecting the promoters of ZNF154, CASP8 and VHL are shared across a vast array of tumor types and may therefore be important for understanding the genomic landscape of cancer. Data from this project have been submitted for publication. Update of research projects on individual functional elements and community impact. Exon Skipping. My work to identify sequence mutations that cause exon skipping (Woolfe et al. 2010) applied statistical tests to determine which features showed statistically significant, predictive ability to discriminate neutral variants from disease-causing mutations. We implemented the results in a web server that evaluates variants of unknown function to predict those most likely to cause exon skipping, Skippy, (http://research.nhgri.nih.gov/skippy/), which continues to receive the most visits of all NHGRI webservers and downloads for private use. In the last year, the Skippy server had 39,381 total page views, 107 average page views per day and 12.21 average page views per visit. In an application of the Skippy toolset, my group showed that synonymous substitutions detected in cystic fibrosis patients cause exon skipping in CFTR. These variants are novel candidates for uncharacterized second allele mutations in CFTR (Scott et al. 2012). Negative regulatory elements. My group developed the first, systematic expression vector system to experimentally assay negative regulatory elements (Petrykowska et al. 2008). Despite the commonly held hypothesis that negative cis-acting elements are present in the human genome, examples have not been widely defined or characterized. My research to help identify negative elements has broader importance because mutations in these elements would be activating for disease and could play a role in a host of diseases. Annotations of NRE discovered by my group are posted on the UCSC Human Genome Browser test web site (EncodeNhgriNre). Since inception of the assay, I have provided the vectors as source materials to the community and continue to collaborate with other labs upon request. Furthermore, I have participated in the ENCODE Consortium analysis groups to experimentally assess the functional activity of putative negative regulatory elements predicted in genomic sequences (ENCODE Cons. et al. 2011 and ENCODE Cons. et al. 2012). Collaborative studies for regulatory element identification I collaborated to identify unique alterations of an ultraconserved non-coding element in the 3'UTR of ZIC2 in holoprosencephaly (Roessler et al. 2012). Another project involved the interpretation of genomic mutations identified through disease studies, whereby whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma (Gartner et al. 2013).