Bioinformatics Developments The Comparative Genomics Analysis Unit continues to develop, maintain, and distribute software tools for the analysis of DNA and RNA sequence data. This year, a suite of tools for the precise detection and specification of structural variants, distributed as a package named SVanalyzer, allows users to characterize the ambiguity of SVs with respect to nearby sequence similarity (SVwiden), detect equivalent SV predictions by comparing altered sequences (SVcomp), genotype known SVs in new datasets (SVbackgenotype), and refine SV predictions using long-read assemblies (SVrefine). SVanalyzer is currently being used by the National Institutes of Standards and Technologys Genome in a Bottle project to integrate SV calls from multiple different calling algorithms and sequencing platforms. Collaborative Work An ongoing collaboration with Dr. Susan Harbison studying the genetics of selection-driven sleep-duration in fruit flies (Drosophila melanogaster) has resulted in two publications (Harbison, Serrano Negron et al. 2017, Serrano Negron, Hansen et al. 2018), and was selected for the 2018 NHLBI Orloff Science Award. Our specific contribution was the analysis of 84 whole genome sequence datasets for long- and short-sleeping Drosophila melanogaster lines resulting from an evolve and resequence experiments, using two different aligners (BWA and novoalign), two different genome builds (Dm3 and Dm6), one variant caller (LoFreq), and custom software developed to calculate allele frequencies. In continued collaboration with Dr. Daphne Bell, we performed somatic mutation detection analysis on 14 tumor (uterine carcinosarcomas)/normal pairs, and identified fifteen genes that were somatically mutated in at least two tumors. Sanger sequencing of these fifteen genes in another 39 primary uterine carcinosarcomas identified FOXA2 being newly implicated in tumorigenesis (Le Gallo, Rudd et al. 2018). In a collaboration with Dr. Philip Shaw, we searched exome datasets for rare, de novo, variants that might be the source of ADHD discordance between monozygotic twins. However, of the eight twins consented for exome sequencing, copy number variants and rare, deleterious SNVs did not emerge as a major driver of discordance in this small sample. (Chen, Sudre et al. 2018) The assembly of the dust mite (Dermatophagoides pteronyssinus) genome and its subsequent analysis has shed new light on the class of proteins involved in allergic responses in humans. We assembled the genome from PacBio circular consensus sequence (CCS) reads with a total assembled size of 52 Mb in 834 contigs and an N50 of 376 kb. Since the starting DNA originated from hundreds of individual mites collected from an inbred colony, the CCS reads also captured genomic variation, enabling the detection of variation present within this colony. RNA extracted from the mites was sequenced as well, allowing the combined analysis of sequence variation and its effect on isoforms, focusing specifically on the genes encoding allergenic proteins. (Randall, Mullikin et al. 2018) In 2010, Morris Animal Foundation received a pledge of $1,000,000 from Hills Pet Nutrition to develop a Cat SNP Chip. SNP discovery efforts in the domestic cat had generated a set of over 10 million SNPs to draw from for this array. The subset of SNPs selected for this array had properties of being polymorphic in more than one breed and were reasonably distributed across the genome. The final array consisted of 62,897 variants and was used to genotype over 2,000 cats. Both the performance of this array and cat population structure analyses are presented here (Gandolfi, Alhaddad et al. 2018).