The NHGRI Bioinformatics and Scientific Programming Core actively supports the research being performed by NHGRI investigators by providing expertise and assistance in bioinformatics and computational analysis. The Core facilitates access to specialized software and hardware, develops generalized software solutions that can address a variety of questions in genomic research, develops database solutions for the efficient archiving and retrieval of experimental and clinical data, disseminates new software and database solutions to the genome community at-large, collaborates with NHGRI researchers on computationally-intensive projects, and provides educational opportunities in bioinformatics to NHGRI Investigators and trainees. Scientific projects completed in 2012-2013 include the valuation of shared motifs in sequences flanking non-Alu driven break point deletions in FANCA and FANCC; the analysis of RNA-seq data to detect global changes in gene expression and splicing due to RRP1B knockdown; the design and implementation of the Clinical Genomic Database (CGD), a searchable, Web-based database of all conditions with known genetic causes; the creation of a comprehensive list of mutations by merging lab-generated genotypes with those in COSMIC (Catalogue of Somatic Mutations in Cancer); quality control and subsequent retrieval of genomic coordinates from HGVS-formatted cDNA and protein mutations; pathway analysis of a list of genes implicated in melanoma by exome sequencing of metastatic melanoma samples; prediction of gene regulatory regions in thymocytes of Itk deficient mice; collation of multi-center survey data to study association between glucocerebrosidase mutations and Lewy Body dementia; and analysis of sequence traces to detection mutations for the zebrafish TILLING project Ongoing scientific projects include the annotation of the Mnemiopsis genome using NextGen sequence data; the detection of gene and isoform expression changes during early development by RNA-seq; the analysis of human disease gene orthologs in Mnemiopsis; the analysis of sequence traces to detect mutations in putative oncogenes in tumor samples; development of a human malformation terminology tool to allow clinicians to build and maintain downloadable spreadsheets of patient conditions; the characterization of large exons in vertebrate, invertebrate, and plant genomes; ongoing updates and improvements to the Breast Cancer Information Core (BIC); the development of a bioinformatic pipeline to map zebrafish retroviral integration sites using Illumina sequence tags and to identify integrations occurring within ENSEMBL-annotated genes; development of a Web site and database to search for integration sites; the analysis of RNA-seq data to investigate alternative splicing in Fanconia Anemia patients; the analysis of alternatively spliced genes in select tissue types over time; the identification of a set of 500 methylation markers that classifies different types of tumors; the identification of DNA binding sites of RRP1B by ChIP-seq; determination of the effects of RRP1B knockdown on gene expression; the development of a customized SQL database for storing and computing on large numbers of records for canine genotypes, phenotypes, sequences, variations, sample data, and pedigree data; the analysis of ChIP-seq data to identify the genomic locations of specific histone modifications in dog bladder tumor cell lines; genome-guided, ab initio, and de novo transcript assembly for RNA-seq data; the analysis of ChIP-seq to investigate Sox10 transcription factor binding at enhancer sites in mouse, and correlating Sox10 binding with EP300 and H3K4Me1 binding; the mapping and annotation of transcription factors to experimental and predicted transcription factor binding sites; the development of a Web-based survey studying how women feel about the techniques doctors use to talk with patients about their weight (Weight Management Interaction Study); identification of co-varying mutations and pathways to classify subtypes of tumors; measurement of gene expression changes in thymocytes over four different stimulations and four different mutations; biomarker selection of targets from RNAi screens; performing multi-species BLAST to estimate cross-species sequence similarity of protein-coding genes; software pipeline to create word cloud based on SMS text responses from visitors as they interact with the Smithsonian genome exhibit; development of a Web interface and data collection for eight surveys developed by SBRB PIs in conjunction with the Smithsonian genome exhibit; and investigation of the effect of silent mutations on splicing from TCGA RNA-seq data.