PROJECT ABSTRACT Defining the features of cellular mixtures, where diverse cell types with distinct genomic characteristics are physically intermingled together, is a central problem in biology. For example, diseases such as cancer are characterized by cellular masses comprised of subpopulations, each with its own set of genetic variants and transcriptional signatures, where inter-population DNA variation is compounded with cell-to-cell RNA expression stochasticity. Characterizing genomic diversity in cellular mixtures and assessing its impact on cell-to-cell gene expression variation require analyses at the resolution of individual cells and contiguous genome molecules. This level of analytical resolution is now feasible with next generation sequencing (NGS) assays that integrate molecular barcoding with single-cell RNA sequencing and single molecule DNA sequencing. These technological advances surmount key challenges and herald new opportunities for the study of disease, but require new analysis methods: (1) Current NGS methods are not optimal for detecting and phasing genomic variants from cellular mixtures. For example, it is difficult to detect complex structural variants (SVs) that are carried by only a fraction of the genomes present within a mixture. Methods based on short read data is hindered by the loss of long range contiguity in heavily fragmented DNA as well as the low mappability of many SV junctions. Single-molecule linked-read DNA sequencing overcomes these drawbacks, but is in need of reliable analysis methods. (2) Single-cell RNA sequencing allows the detection of distinct cellular subpopulations with unique transcriptional signatures, however, data from individual cell transcriptomes have high levels of error and bias. New analysis procedures are needed to make statistically sound inferences. (3) The existing methods for single-cell expression analysis typically ignore DNA heterogeneity, which can be crucial for some studies, especially for cancer. It is yet unclear how to simultaneously characterize variation at both the DNA and RNA levels in a cellular mixture. This proposal addresses these issues by developing new statistical methods and experimental designs that enable accurate characterization of cellular mixtures exhibiting both DNA and RNA variations. We propose to develop methods to (1) detect, characterize, and phase complex variants using new single-molecule sequencing technology, (2) improve expression estimates obtained from single-cell RNA sequencing data, and (3) combine bulk single-molecule DNA sequencing and single-cell RNA sequencing to quantify the relationship between DNA variation and transcriptomic variation in genetically heterogeneous samples such as cancer.