Highly parallel functional characterization of human regulatory elements PROJECT SUMMARY One of the most effective means of identifying human regulatory elements is by discovery of open chromatin using methods like DNase hypersensitivity or FAIRE. While there is ample evidence that open chromatin regions are functional and bound by sequence-specific regulatory factors, we typically do not know what function an individual element has, or how DNA sequence variation in human open chromatin regions affects that function. Traditionally, function has been measured experimentally in reporter assays, one functional element at a time. However, it is not feasible to characterize the ~100,000 open chromatin regions that exist in each cell type using low-throughput, serial methods. We propose to develop two complimentary approaches to overcome these obstacles. The first will test the function of tens of thousands of human regulatory elements in a single experiment, and the second will test the effect of natural human sequence variation within 10,000 of those elements in a single experiment, representing 1,000 to 10,000-fold improvements over existing methods. First, putative regulatory elements isolated by FAIRE will be cloned en masse into a Gateway-based entry vector, allowing us to easily swap the inserts into reporters that test promoter, enhancer, insulator, or silencer function. Cells containing inserts with biological activity can be isolated by cell sorting, and the corresponding inserts can be identified by next-generation sequencing. We also will develop a variant of this method that does not require cell sorting. A second major obstacle in discovering the effect of human sequence variation on the function of regulatory elements is the limited ability to measure the effect of a large number of designed DNA sequences in a highly controlled setting. Using Agilent array technology, we will synthesize 10,000 regulatory sequences ~200 bp in length that corresponds to alternate alleles of 5,000 putative regulatory regions. The 5,000 regions synthesized will be selected based on their linkage to human disease risk. After transfection into cells, we will use a flow cytometer to sort the resulting pool of transfected cells into 64 bins of reporter levels, amplify the inserted synthesized region from the cells of each bin using PCR, and measure the DNA content of each activity bin using next-generation sequencing. For every barcode (representing one tested element), the distribution of its next-generation sequencing reads across the expression bins provides a measure of both its mean and standard deviation of expression. Promoter, enhancer, insulator, and silencer function will be tested. Since all tested sequences are transfected to the same cell line, the trans-factor environment is held constant, allowing us to truly test whether the genetic variation among human individuals has a causal effect on expression.