Heterogeneous Cancer Progression from Microarray Data the class of diseases collectively known as cancer could in principle be produced by a limitless number of combinations of mutations. Nonetheless, it has become apparent that most cancers can be grouped into a few common "sub-types," each characterized by a common way in which the controls on cell growth become disabled. By identifying these common sub-types and the particular sequences of genetic abnormalities that produce them, we can identify patient sub-populations who may respond to different treatments than the general population, find genes that may be useful targets for new anti-cancer drugs, and develop diagnostic tests to better predict patient outcomes and suggest which drugs will benefit which patients. Great progress has been made by examining gene expression within tumors, as different cancer sub-types have characteristic patterns of overly active or overly inactive genes. Trying to interpret these expression data is, however, a difficult problem for which sophisticated computer models have proven invaluable. One class of computer models - phylogenetic (evolutionary tree) models - has provided a powerful method for interpreting likely pathways by which different cell types evolve within tumors. There are two important variants of this phylogenetic approach: one using data gathered from gene expression microarrays, which assay thousands of genes averaged over large tumor samples, and another using data gathered from cytometric studies, which assay small numbers of genes in individual cells isolated from tumors. Each has advantages, the former in allowing a far more complete picture of overall gene activity and the latter in providing valuable clues about tumor evolution by identifying which cell types co-occur in individual tumors. The proposed work will develop new computer models for these problems in order to develop a single approach with the advantages of both methods. The work will first develop approaches to infer the existence of common cell types from bulk microarray measurements of tumors sampled across patient populations. It will then build on prior methods to infer evolutionary similarity between these tumor states. It will, finally, adapt methods for cytometric tumor phylogenetics to the problem of inferring evolutionary sequences from these microarray states. The result will be a unified approach for inferring evolution among individual cell states, as in a cytometric study, but assayed on thousands of genes, as in a microarray study. The unified approach will be validated on breast cancer data, for which both microarray and cytometric measurements are available, and applied to the discovery of common progression pathways in breast cancer populations. The study can be expected to uncover distinct stages in the breast cancer progression that would not be apparent by existing methods, aiding in the identification of new patient sub-populations, drug targets, and diagnostic tests. The methods to be developed are likely to have broader applicability to solid tumor progression in general and to related problems of analyzing cell differentiation in mixed samples.