This is an exploratory application aimed at developing novel computational approaches for identifying infectious agents that contribute to cancer. To achieve this we will merge data from two distinct approaches: gene expression profiling and metagenomics. Approximately 20% of all cancers worldwide are associated with infectious agents including viruses, bacteria and parasites. It is likely that this number is an underestimate and that many more cancers are caused by agents that await discovery. One powerful approach to uncovering potential cancer-causing microorganisms is virtual subtraction in which tumor genomic sequences or gene expression profiles are searched for non-human sequences. Potential cancer causing agents are then identified among these nonhuman sequences by searching public nucleic acid and protein databases and identifying similar sequences that can then be associated with a known virus, bacteria or other organism. A major limitation of this approach is the lack of representation of sequences from most organisms, especially microorganisms, in the databases. Metagenomics is an approach in which specific biomes are sampled for all microorganisms followed by deep sequencing. Individual species are then identified by comparing the sequence reads obtained with sequences deposited in public databases. Studies from a number of laboratories including our own have shown that most sequences obtained in metagenomic surveys do not match anything in existing databases suggesting they are derived from previously uncharacterized agents. For example, our studies suggest that the 3,000 or so currently known viruses represent less than 0.01% of viruses in nature. Similarly the vast majority of bacterial species await discovery and characterization. We propose to search gene expression profile data to determine if any of these novel metagenomic sequences are expressed in tumors. We will also develop the computational tools that will allow uncharacterized viruses, bacteria or other organisms that we identify to be isolated and their association with cancer studied. The identification of new potential tumorigenic infectious agents will have a direct impact on the diagnosis and treatment of cancer. PUBLIC HEALTH RELEVANCE: In order to design diagnostics and therapies for different cancers we must know what is causing them. This project is aimed at discovering infectious agents that cause or contribute to the cause of cancer. The identification and characterization of these agents will thus lead to new methods for the diagnosis and treatment of cancer.