Bioinformatics Developments The Comparative Genomics Analysis Unit continues to develop, maintain, and distribute software tools for the analysis of DNA and RNA sequence data. This year, QoRTs, the units software package to perform quality control analyses on RNA-Seq data, was described in a manuscript published in BMC Bioinformatics (Hartley and Mullikin 2015). QoRTs generates various quality control metrics and produces a wide variety of plots that make it easy for a bioinformatician to identify consistent biases that would otherwise be obscured by the vast size and dimensionality of RNA-Seq data. This package is available through github (https://github.com/hartleys/QoRTs). In addition, we have developed a new software tool, JunctionSeq, which reports differential usage of exonic junctions in RNA-Seq data from two sets of biological replicate samples representing different biological conditions. In addition, the unit worked to develop a software pipeline to run four different somatic mutation detection algorithms (Shimmer, SomaticSniper, Mutect, and Strelka) on a set of tumor and matched normal samples, and then report VarSifter-formatted somatic mutations with variant allele frequencies and gene annotations for all samples. In addition, we submitted somatic single-nucleotide and deletion-insertion variants to the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, contributing to a 2015 publication in Nature Methods (Ewing, Houlihan, et al., 2015). To assess a number of software packages for the detection of sequence variants in pooled DNA samples, the unit performed a comparative analysis of five different packages, and found that the programs LoFreq, CRISP, and the Genome Analysis Toolkit showed optimal sensitivity to detect low frequency, singleton variants (Huang, NISC Comparative Sequencing Program, et al., 2015). Collaborative Work The unit continues to collaborate with investigators at NHGRI who are sequencing cancer samples to find putative driver mutations that may contribute to tumorigenesis. In collaboration with Dr. Daphne Bell, a senior investigator at NHGRI, we analyzed Sanger sequence data from genes in the tyrosine kinome in endometrial cancer samples, and found that the TNK1 and DDR2 genes are frequently mutated in the catalytic domain in serous and endometrioid endometrial cancers, but not in clear cell endometrial cancers (Rudd, Mohamed et al., 2014). In collaboration with Dr. Paul Liu of NHGRI, we sequenced and analyzed core binding factor (CBF) acute myeloid leukemia samples from diagnosis, remission, and relapse to characterize the mutational landscape of these samples and investigate potential mechanisms in the clonal evolution of relapse leukemia (Sood, Hansen et al., 2015). Together with investigators from the Smithsonian Institute, University of Maryland, and Duke University Medical Center, we developed genomic resources for the endangered Hawaiian honeycreepers. This units contribution was the whole genome assembly of a Hawaii amakihi (Hemignathus virens) and detection of 3.9M single nucleotide variants from sequence data generated at the NIH Intramural Sequencing Center (Callicrate, Dikow et al, 2014). Continued investigations into the ClinSeq dataset, this time focusing on loss-of-function (LoF) mutations, provided the opportunity to check individuals with these mutations for associated phenotypic impacts (Johnston, Lewis et al. 2015). A broad collaborative effort headed by Dr. Bruce Howard at NICHD studied age-related changes in gene expression and methylation patterns of human ovarian granulosa cells using DNA methylome and transcriptome sequencing (Yu, Russanova et al. 2015).