Single-cell RNA-sequencing (scRNA-seq) has emerged very recently as a powerful technology to investigate transcriptomic variation and regulation at the individual cell level. Traditional bulk RNA-seq pools RNA from a large number of cells and measures the averaged expressions in a sample. In contrast, scRNA-seq reveals cell to cell heterogeneity, providing critical information to the understanding of biological processes in development, differentiation, and disease etiologies. This new technology leads to an expansion of applications in both basic and clinical research, but also brings challenges in analysis with its unique data characteristics. These include: 1) difficulty in estimating molecule counts with the presence of technical artifacts, due to small amount of starting material and additional sample preparation procedures; 2) lack of appropriate methods for functional clustering for single cell RNA count data, which are much sparser than bulk RNA-seq; 3) lack of a quantitative measure and comparison of heterogeneity. We propose to address these challenges by developing a series of novel statistical methods for scRNA-seq data preprocessing and analyses. This includes removing technical bias in RNA capture and amplification to obtain accurate molecule level counts, identifying functional types/ subtypes of cells and interpretable feature groups, explaining heterogeneities between samples and cells, and identifying differential heterogeneity. All methods developed in this project will be implemented and released as free, open source software to benefit the genomics research community. The probability model and statistical framework established in this proposal will lay a foundation for future methodology development for other single cell sequencing experiments such as single-cell ATAC-seq or BS-seq.