Predicting the molecular complexity of a genomic sequencing library has emerged as a critical but difficult problem in modern applications of DNA sequencing. In applications like RNA-seq and single-cell sequencing, the molecular complexity of the underlying biological sample is also of central interest. This project will produce computational methods for predicting the number of distinct molecules that will be sequenced from deeper sequencing of an existing sequencing library. We will adapt these methods to also predict saturation in RNA-seq and the fraction of the genome covered above some fold in genome resequencing as a function of sequencing depth. We will also develop methods for estimating heterogeneity of phenotypes in a tissue based on single-cell RNA-seq experiments. These methods will allow investigators to optimize their use of DNA sequencing resources, minimizing waste and improving throughput.