Recent studies of cancer genomes on a single-cell level have revealed the complexity of the disease and the presence of multiple genealogically related cell populations in a tumor. Detailed knowledge of the clonal structure of a cancer potentially is of high clinical value: multiplicity of clones or of lesions in most advanced clonesis a possible measure of progression; spatial pattern of clone dispersal in a tumor may signal elevated propensity to invade; lesions observed in individual clones but not in the bulk tissue may point to targets for therapy. DNA copy number profiling of cells from low-coverage sequencing is an accurate, economically feasible technological approach to the study of cancer sub-population structure. Novel multiplex sequencing techniques, developed by the Wigler lab at CSHL, permit simultaneous sequencing of hundreds of single-cell DNA specimens and their subsequent copy-number profiling at up to 50kb resolution. Optimal use of this data type for robust reconstruction of cancer cell phylogenies is a challenging computational problem requiring new informatic and statistical tools. The dual purpose of this project is, first, to deveop, optimize, test and deploy such tools; and, secondly, integrate the results of phylogenetic analysis with other forms of genomic, biological, pathological and clinical data. The centerpiece of the proposed phylogenetic analysis pipeline is to use a recently developed computational approach termed CORE (Cores Of Recurrent Events) for transforming copy number profiles of single cells into a form suitable for phylogeny. CORE is a flexible method, with multiple options allowed for several of its components. This freedom of choice will be exploited for optimization of performance. The entire pipeline will be tested using both simulated data with known underlying phylogeny and genomic profiles from pathologically characterized tumor tissues we have already generated. Preliminary application of CORE to the phylogenetic reconstruction problem showed ample promise to justify further study. In parallel, we propose to design an interactive user interface to single-cell genomic data integrated with their phylogenetic interpretation and several other forms of genomic, clinical and pathological annotation. The potential utility of such a comprehensive integrated interface is strongly suggested by a preliminary study. All software products to be developed in the course of the project will be made available to the community, for use and further development.