PROJECT SUMMARY A major challenge common to understanding phenotypic diversity, modeling selection in evolution, and developing precision medicine is enhancing our currently limited ability to predict disease and phenotypic outcomes based on genome sequence and environmental exposures. A comprehensive understanding of genetic variation and its role in conditioning phenotypes requires systematic, perturbation-based testing of genetic variants across the genome in multiple environments and in an isogenic background. Previous systematic genome perturbation efforts have focused primarily on engineering loss-of-function, but naturally occurring variants have the most relevance to understanding medically relevant phenotypes like human traits and disease. Such variants have been studied via genome-wide association studies (GWAS) and quantitative trait locus (QTL) analysis, but these approaches are limited to the haplotypes that appear in the study population, and only in few cases have the actual causative variants been identified. Advances in genome editing technologies have made engineering specific genetic variants feasible at a large scale. This proposal aims to systematically engineer and functionally profile a genome-wide `variation collection' in three genetically distinct strains that cover all natural single-nucleotide variants (SNVs) in the Saccharomyces cerevisiae species as well as SNVs associated with human diseases. The collection will be constructed by a high-throughput CRISPR approach, leveraging an in-house sequence parsing technology (Recombinase Directed Indexing, or REDI) that will allow rapid, inexpensive isolation of sequence-verified variant strains among the millions that will be generated. Because some variants only exert their effects in certain environments, this strain collection will be profiled in hundreds of conditions, including exposure to various stresses and drugs. DNA barcodes integrated into the genome of each strain will enable pooled, competitive growth, and allow the comprehensive identification of variants in a genome that modulate fitness in a given condition in a single experiment. Finally, to dissect the genetic architecture of pathways underlying diseases and identify key interactions, strains carrying combinations of SNVs will be analyzed. The strain collection will be made available to the community for further phenotypic investigations. In addition to the gene x environment (GxE) dataset that will likely be the largest produced to date, the technological, analytical, and visualization pipelines will be publicly shared and integrated into community resources. This work will constitute an unprecedented investigation of the consequences of genetic variation and their dependence upon environment, while providing valuable resources for the scientific community. It will lay technological and conceptual groundwork for systematic perturbation-based studies of genetic variation in human cells that will inform the prediction of disease risk and the design of therapeutic strategies based on genome sequence.