The African malaria mosquito Anopheles gambiae, because of its epidemiological importance, was the first disease vector sequenced. In order to perform full genome annotation for An. gambiae, all sequences that have a biological role in this malaria vector must be identified. However, large gaps, incorrect orientation of some scaffolds, and unmapped sequences still pose a serious problem for accurate annotation and functional characterization of the genome. Knowledge of the full complement of mosquito genes, regulatory elements, and repetitive elements is incomplete without information about what lies in those missing sequences. Moreover, the abundance of incorrectly assembled "hybrid" M and S sequences in the current PEST genome assembly leads to the confusion between paralogous genes and genes from different hyplotypes. The assembly of gene-poor, repeat-rich heterochromatic regions is especially fragmented, and no gene from the heterochromatic Y chromosome has been found. The presence of readable polytene chromosomes and the paramount impact of malaria on public health make An. gambiae the best choice as the first vector with a highly finished genome assembly. The major thrust of this R21 project is to develop a high-quality genome assembly and to explore the genetic content of heterochromatin and the Y chromosome of Anopheles gambiae S-form using high- throughput technologies. The availability of equipment for automated in situ hybridization, next-generation sequencing, laser microdissection, and confocal microscopic analysis, as well as the expertise of the PI and Co-PI in cytogenetics and genomics, will ensure successful achievement of the project's goals. Briefly, the project's specific aims are to 1. Close interscaffold gaps in the S-form assembly by high-throughput whole-genome resequencing and targeted sequencing of BAC/fosmid clones. 2. Physically map and orient sequencing scaffolds to the polytene chromosomes. 3. Microdissect, sequence, and characterize the An. gambiae heterochromatin and Y chromosome. This project will greatly improve the current fragmented genome assembly for the An. gambiae S-form. A new genome assembly will enable researchers to work on functional annotation of the most complete sequencing set and to perform more comprehensive comparative genomics studies with other vectors and model organisms. The availability of genes from the Y chromosome will allow researchers to develop sex- specific vector control interventions and conduct phylogenetic analysis of the An. gambiae complex without complications of introgression. The scientific community will be able to access the new assembly from VectorBase for further mining and functional characterization of the An. gambiae genome. PUBLIC HEALTH RELEVANCE: Targeting genes responsible for vector competence is a novel approach for controlling infectious diseases. A full genome annotation for the major African malaria vector Anopheles gambiae will facilitate identification of the epidemiologically important targets. Toward this goal, the proposed project will develop a high-quality genome assembly for Anopheles gambiae using high-throughput technologies.