The analysis of individual parts of the genome enables a more comprehensive understanding of how the parts fit together in the broader context of disease. The following projects build on my previous work and utilize integrative analyses of genomic datasets to address the genome as a complex regulatory system. Update of research projects on individual functional elements and community impact. Exon Skipping. My work to identify sequence mutations that cause exon skipping (Woolfe et al. 2010) applied statistical tests to determine which features showed statistically significant, predictive ability to discriminate neutral variants from disease-causing mutations. We implemented the results in a web server that evaluates variants of unknown function to predict those most likely to cause exon skipping, Skippy, (http://research.nhgri.nih.gov/skippy/), which continues to receive the most visits of all NHGRI webservers and downloads for private use. Recent work in my lab, using the Skippy web server to prioritize all known synonymous substitutions in the CFTR gene, identified two uncharacterized variants that cause exon skipping and are likely to contribute to cases of cystic fibrosis (Scott et al. 2012). Recent applications of this software by other groups include the identification and functional characterization of 11 novel PLP1 mutations involved in Pelizaeus-Merzbacher disease and spastic paraplegia type 2 (Grossi et al. 2011), in prioritization of mutations detected in the COL4A1 gene involved in Walker-Warburg Syndrome in humans (Labelle-Dumais 2011), in evaluation of mutations involved in risk of systemic lupus erythematosus (Bronson et al 2011), and in diseases important to cattle breeding strategies (Gargani et al. 2011). Negative regulatory elements. My group developed the first, systematic expression vector system to experimentally assay negative regulatory elements (Petrykowska et al. 2008). Despite the commonly held hypothesis that negative cis-acting elements are present in the human genome, examples have not been widely defined or characterized. My research to help identify negative elements has broader importance because mutations in these elements would be activating for disease and could play a role in a host of diseases. Annotations of NRE discovered by my group are posted on the UCSC Human Genome Browser test web site (http://genome-test.cse.ucsc.edu/cgi-bin/hgTrackUi?hgsid=2890633&c=chr4&g=wgEncodeNhgriNre). Since inception of the assay, I have provided the vectors as source materials to the community and continue to collaborate with other labs upon request. Furthermore, I have participated in the ENCODE Consortium analysis groups to experimentally assess the functional activity of putative negative regulatory elements predicted in genomic sequences (ENCODE Consortium et al. 2011 and ENCODE Consortium et al. 2012). New promoter elements. With the objective of identifying new regulatory components in promoters, my group examined mutation data in the Ankyrin-1 (ANK-1) promoter of a patient with ankyrin-deficient Hereditary Spherocytosis (HS), and showed that a novel mutation disrupted the binding of the transcription factor TFIID (Yang et al. 2011, LaFlamme et al. 2010). We hypothesized that the underlying sequence represented a novel promoter element used in promoters of additional genes. We examined 17,181 human promoters for the experimentally validated binding site, called the TFIID localization sequence (DLS) and found three times as many promoters containing the DLS motif than TATA motifs. The region of enrichment was localized to a window of 150 bp upstream to 250 bp downstream of the transcription start site in a profile similar but distinct from the TFIIB Recognition Element (BRE). Mutational analyses of DLS sequences in promoters of randomly chosen genes confirmed the functional significance of the DLS, and addition of a DLS to an SP1 site indicated the combination was sufficient to confer basal promoter activity. The results demonstrate that novel promoter elements can be identified on a genome-wide scale through observations of regulatory disruptions that cause human disease. Projects to integrate multiple functional aspects into disease interpretation. Address the evolution of the human genome through the emergence of new human-specific genes regulated by bidirectional promoters. Research from my group previously established the enrichment of bidirectional promoters in vertebrate genomes including human, mouse, rat, and cow (Yang et al, 2008), which indicates evolutionary selection to maintain their presence. Despite the cross-species similarities, we discovered that some bidirectional promoters correspond to positions of unidirectional promoters in other vertebrate species; leading to the hypothesis that species-specific bidirectional promoters greatly and uniquely target the detection of species-specific transcripts in any genome. We confirmed this hypothesis while participating in the Bovine Genome Consortium (Bovine Sequencing Consortium et al. 2009) and identified a spliced, highly expressed, multi-exon (noncoding) transcript regulated by a bidirectional promoter that was exclusive to the bovine lineage (Piontkivska et al. 2009). To find human-specific transcripts, my group identified a set of 1,400 nonconserved, novel noncoding transcripts flanking bidirectional promoters (Gotea et al. 2012, submitted). Once identified, we tested the transcripts for signs of positive selection, as an indicator of beneficial function to the human genome. After integrating computational and experimental data, we found nucleotide substitutions that facilitate the emergence of new exons in those genes. The gene list provides the basis for understanding novel transcripts that are present only in the human genome. Moreover, using this approach, transcripts can be identified that are unique to any species. Data from this project has been submitted for publication. Compare genome-wide methylation patterns in subtypes of ovarian tumors. I am testing the hypothesis that altered DNA methylation in promoter regions can distinguish genes that are relevant to ovarian tumor pathology. Given the sporadic nature of 90% of ovarian cancers, disruption of normal gene regulation is a likely contributor to disease etiology. Methylation patterns at 25,475 unique loci in 43 samples of ovarian, endometrial or metastatic tumors, along with normal fallopian tube and normal endometrium were assessed. Data from this project showed that methylation patterns mirrored histopathological subdivisions of ovarian tumors and discriminated tumor types with finer granularity and greater reproducibility than published gene expression assays (Kolbe at al. 2012). The extensive differences we showed between tumor and normal samples are the first report of a methylator phenotype in ovarian endometrioid tumors, analogous to the methylator phenotype identified in colorectal cancer and glioblastoma. Ongoing studies will be to look for biomarkers for use in diagnostic tests.