Overview: Highly mutable RNA viruses, such as human immunodeficiency virus and hepatitis C virus are major causes of morbidity and mortality in the world. The hallmark of RNA viruses is their extremely high genetic diversity that allows them to rapidly establish new infections, escape host's immune system and develop drug resistance. Emergence of next-generation sequencing technologies promises to revolutionize the fields of virology and epidemiology by allowing to sample and characterize millions of intra-host viral variants in thousands of infected individuals. However, our understanding of mechanisms of disease spread and viral evolution are still limited due to the lack of computational methods for processing, integration and analysis of biomedical big data. The overarching goal of this project is to develop a comprehensive family of innovative algorithms and models that allow to describe, analyze, understand and predict complex multidimensional non-linear disease dynamics. Intellectual Merit: The proposed research will be conducted by an interdisciplinary team comprised of biologists, mathematicians, molecular epidemiologists and computer scientists with extensive expertise in the areas relevant to the project. The project will target highly important epidemiological and biomedical problems including development of efficient and scalable computational methods for surveillance of disease spread, modeling of epidemiological dynamics by incorporation of intra-host and inter-host evolutionary dynamics into a single framework and design of computational tools for utilization of data analysis results by health care professionals. Proposed algorithms and models will be validated using massive molecular and epidemiological data generated by project collaborators from CDC and Georgia Tech, as well as available from public sources. The algorithms will be distributed to the researchers and health care workers as free open-source packages and cloud-based online tools. In particular, they will be incorporated in the Global Health Outbreak and Surveillance Technology, a web-based data analysis system currently being developed at CDC. Research findings will be broadly disseminated via journal publications and conference presentations, including the International Symposium on Bioinformatics Research and Applications and Workshop on Computational Advances in Molecular Epidemiology organized by the Pis.