PROJECT SUMMARY Short peptides (10-100aa) are important regulators of physiology, development and metabolism, however their detection is difficult due to size and abundance. A stunning 30% of annotated human smORF genes include disease-associated variants mapped within exons, compared to 15% of human genes in general. Further, many smORFs are conserved across the entire metazoan phylogeny from invertebrates to vertebrates including man. These ultra-conserved functional smORF genes we call the Conserved smORF Catalog or CSC. These genes have been conserved across more than 500myr of evolution, and yet we know almost nothing at all about their functions. Due to a century of genetic analysis, the genome of the model organism Drosophila melanogaster has the most complete functional annotation among metazoans. Functional annotations derived from Drosophila have been instrumental in hypothesis-based drug development for more than thirty years, and more recently have made possible the biological interpretation of hundreds of SNPs detected in genome-wide association studies (GWAS). Hence, functional annotations derived in fly for conserved genes are transferable to human and are of direct clinical relevance. Remarkably, less than 10% of smORFs in Drosophila have been studied functionally, or experimentally verified as generating peptides. A combination of genome engineering, computational, molecular, and functional studies will be used to systematically and comprehensively characterize the CSC, representing the first genome-scale characterization of smORFs in any organism providing a wealth of information on the biological functions of this poorly studied class of proteins. In total, we will characterize and functionally annotate ~400 conserved smORFs using CRISPR knockout followed by phenotyping and rescue assays. We will assess the phenotypes of the mutants, measuring viability, morphology, fecundity and fertility, lifespan, metabolism (sugar and lipid levels), and a number of behavioral phenotypes. For smORFs with robust phenotypes, we will then attempt to rescue a subset of these mutants in three ways: first, by inserting the whole deleted RNA; second, with a version of the RNA with the smORF(s) removed by the addition a stop codon; and lastly, using a micro- construct containing only the smORF and the endogenous promoter. We will generate direct evidence for translation using tagged expression analysis and targeted MS/MS to scan for predicted polypeptides in the whole embryo and tissue dissection samples. In addition to validating the existence of the predicted molecules, this dataset will provide a foundational gold standard for further development of tools for the computational prediction of functional micropeptides. These studies are directed toward the understanding of basic life processes and lay the foundation for promoting better human health.