This application addresses board Challenge Area (08) Genomics and specific challenge topic, 08-DA-102 Improved Bioinformatics Analysis for Deep Sequencing. The number of human samples undergoing whole-genome sequencing is expected to increase dramatically in the next few years, as advances in next-generation sequencing technologies continue to lower the cost of sequencing. In addition to detection of sequence variation, these data can be used to estimate DNA copy number variation and subsequently to examine correlation between copy number and phenotype. In this proposal, we aim to develop a series of computational steps and integrated analysis pipeline for accurate estimation of copy number from next-generation sequencing data. This involves efficient processing of the sequencing data, including appropriate alignment procedures and correction for experiment artifacts. For estimation of the copy number along chromosomal location, we will develop novel segmentation procedures, both for a single sample and for multiple samples, to take advantage of the specific nature of sequencing data. Importantly, we also address issues in experimental design, especially the effect of depth of sequencing (genome coverage) and read length on the resolution and accuracy of copy number profiles. We use data from a number of platforms including Solexa, SOLiD, and CompleteGenomes for our studies. The pipeline developed in this proposal will be implemented on a powerful distributed computing system and will be made available freely to the research community. The results of this project will thus enable efficient extraction of copy number from whole-genome sequencing data and will facilitate rapid translation of next-generation sequencing technology to identify structural variations associated with normal or disease phenotypes.