The standard method used to preserve tissue morphology for pathological diagnosis and sample archiving of tumors is formalin-fixed, paraffin-embedded (FFPE). Archival tumor samples, as are available in epidemiological and clinical settings where tumor blocks have been archived for 20 years or more, are rich sources of tumor material for a broad range of research questions. As FFPE samples are utilized for virtually all routine pathology tests, they provide information on the gene expression of large patient populations with long-term clinical follow-up. Opening the vast archives of FFPE tissues to high-throughput expression profiling is critical to the development of clinically relevant biomarkers and to the genomic study of cancer subtypes as they relate to lifestyle and environmental factors. Along with these promises for both population and clinical research, come significant technical and data analytic challenges. These are born out of the degradation and cross-binding of RNA, intrinsic in the FFPE methodology. All existing and foreseeable technologies for expression measurement will entail sources of variation unique to FFPE. Our ability to fully exploit information in archival samples depends critically on the availability of principled, reliale, tailor-made, and publicly available tools for statistical and bioinformatic analysis. The identification of prostate cancer subtypes is a perfect case in point. A focus on more homogeneous groups may enhance understanding of underlying mechanisms of disease, and lead to more successful treatment and prevention through different strategies for each subtype. However, progress in this area has been hampered by the modest sample size and by the opportunistic designs common to mRNA profiling studies of fresh frozen (FF) tissues. The potential for discovery of novel prognostic subtypes through gene expression profiling of large cohorts of FFPE samples is a unique opportunity to advance the field of prostate cancer biomarkers. The investigative team bring together in-depth experience of statistical methods for both cancer epidemiology and genomic data analysis, with expertise in prostate cancer epidemiology and pathology, and access to a unique cohort of men with prostate cancer who participated in two US prospective studies: the Physicians Heath Study (PHS) and the Health Professionals Follow-up Study (HPFS). Their goal in this proposal is to use their complementary and well-integrated expertise to develop free open source FFPE-specific analytic tools, validate them theoretically and empirically, and use them to investigate prostate cancer molecular subtypes in a large and well-annotated cohort.