Bioinformatics Developments The Comparative Genomics Analysis Unit continues to develop, maintain, and distribute software tools for the analysis of DNA and RNA sequence data. This year, JunctionSeq, the units software package to perform splice junction usage analyses on RNA-Seq data, is described in a manuscript published in Nucleic Acids Research (Hartley and Mullikin, 2016). JunctionSeq is capable of detecting differential usage of known and novel splice junctions without the need for an additional isoform assembly step, greatly improving sensitivity when the available transcript annotation is flawed or incomplete. JunctionSeq also provides a powerful and streamlined visualization toolset that allows bioinformaticians to quickly and intuitively interpret their results. The JunctionSeq package is extensively documented and includes a comprehensive walkthrough and example dataset, with line-by-line instructions describing the complete analysis pipeline. JunctionSeq is included in Bioconductor release 3.3 (http://bioconductor.org/packages/JunctionSeq/), and is available with additional online help and documentation at the JunctionSeq GitHub page: http://hartleys.github.io/JunctionSeq/. In 2016, the unit began to implement and improve software tools for the determination of sample identity and purity from very low coverage sequencing data. Utilizing a previously published Bayesian model of identity by descent (IBD), we established the feasibility of distinguishing self, sibling, parental, or unrelated relationships in a pair of samples. In addition, we demonstrated the use of separate, publicly available software to determine the ancestry of samples from very low coverage sequencing data. These tools and models allow for extensive quality control testing of metagenomic or forensic samples with very small amounts of human DNA. Since the use of the units whole exome analysis pipeline continues to lead to important genomic discoveries (see Collaborative Work section, below), we contributed a book chapter (Hansen, 2016) to a recent volume of Springers Methods in Molecular Biology, detailing the use of our MPG software to call germline variants from next generation sequencing reads and format them for viewing in VarSifter. This description of the units variant calling pipeline serves as an instruction manual for the use of our software as well as documentation of the algorithms that are used to call variants. Collaborative Work Reproducibility is receiving increased attention across many domains of science, and genomics is no exception. In this reporting cycle, we collaborated on two exome sequencing analysis projects examining reproducibility. In the first, the technical replication variability of detectable sequence variation is assessed (Cherukuri, Maduro et al. 2015), and in the other, the reproducibility of various programs copy number variation predictions is also examined (Hong, Singh et al. 2016). These results point to the need for better accuracy and evaluation of copy number variation callers. In an RNA sequencing study during this reporting cycle (Hartley, Coon et al. 2015), we analyzed the expression profiles of rat pineal gland tissues with and without surgical interruption of pineal gland stimulation, and established night/day differences in the expression of thousands of genes. Then, in in vitro experiments, treatment with norepinephrine and dibutyryl cyclic AMP, chemicals involved in pineal transcription, reestablished these day/night differences in many of the same genes, showing that the pineal-defining transcriptome is established prior to the neonatal period in rats. These findings help to advance the field of neurotranscriptomics, which is the study of neural control of transcriptomes. Previously collected exome datasets for the ClinSeq project have been re-analysed for two different studies. In one study (Beck, Mullikin et al. 2016), we performed a large-scale, systematic evaluation of Sanger-based validation of NGS variants using data from the ClinSeq project and determined that a single round of Sanger sequencing is more likely to incorrectly refute a true-positive variant from NGS than to correctly identify a false-positive variant from NGS. We therefore came to the surprising conclusion that validation of NGS-derived variants using Sanger sequencing has limited utility, and best practice standards should not include routine orthogonal Sanger validation of NGS variants. In the second study (Ng, Hong et al. 2016), we used ClinSeq exome and whole genome sequencing data to assess the utility of next generation sequencing as a preemptive pharmacogenetic screen for 203 clinically relevant pharmacogenetic variant positions from the Pharmacogenomics Knowledgebase and Clinical Pharmacogenetics Implementation Consortium and to identify copy-number variants (CNVs) in CYP2D6. The unit continues to collaborate with investigators on the study of rare diseases through the analysis of exome datasets. In (Boyden, Desai et al. 2016) we discovered a missense substitution in ADGRE2 in patients with autosomal dominant vibratory urticarial. The replacement of cysteine with tyrosine at amino acid position 492 (p.C492Y), was the only nonsynonymous variant co-segregating with vibratory urticaria in the two large families included in the study. This variant probably leads to a less stable interaction between subunits of the protein, allowing it to be more easily broken by vibration. In (Kruszka, Uwineza et al. 2015) an individual with features of limb body wall complex and his unaffected parents were whole exome sequenced and analyzed, resulting in the detection of a de novo heterozygous mutation in the gene IQCK: c.667C>G; p.Q223E. This variant was shown to be functionally important in zebrafish. In (Maduro, Pusey et al. 2016) we report the genomic analysis of a three-generation family segregating mild intellectual disability with the presence of a chromosomal translocation. Using whole genome sequencing, we further resolved this as a complex unbalanced chromosomal rearrangement disrupting TCF4 and altering TCF4 isoform expression. In (Malicdan, Vilboux et al. 2015) we identified biallelic mutations in the human TALPID3 ortholog, KIAA0586, in six children with findings of overlapping Jeune and Joubert syndromes. These mutations were detected using inheritance models applied to each of the six child-parent trios and screening for inconsistent/de novo, dominant and compound heterozygous mutations, in addition to comparisons with population frequencies and consideration of mutation consequences. Similarly in (Vilboux, Malicdan et al. 2016) we applied inheritance models to three affected individuals in two families presenting with similar cerebellar and ocular involvement. This led to the discovery of new LAMA1 mutations, thus broadening the phenotypes associated with LAMA1 mutations. In (Zhou, Wang et al. 2016) we discovered loss-of-function mutations in TNFAIP3 leading to A20 haploinsufficiency as the cause of rare form of an early-onset autoinflammatory disease.