Maha R Farhat, MD is an Instructor of Medicine at Harvard Medical School on the tenure track and a staff physician in the Department of Pulmonary and Critical Care Medicine at Massachusetts General Hospital. She is completing a masters of biostatistics at the Harvard School of Public Health in 5/2015. She has spent the last 4.5 years acquiring skills in Mycobacterium tuberculosis biology, epidemiology, bioinformatics and biostatistics. She has experience in the analysis of whole genome sequence data, drug resistance data and patient clinical outcome data with the focus of identifying Mycobacterium tuberculosis genetic determinants of drug resistance. She has also developed new methods in this area. Dr. Farhat has 11 publications 5 of which are first author including high impact and highly cited work in the journals Nature Genetics, Genome Medicine and the International Journal of Tuberculosis and Lung Disease. The short term goals of this K01 award are to provide training for Dr. Farhat in critical aspects of data science, computational and evolutionary biology, advanced biostatistics and network science. Dr. Farhat's long term goal is to become a leader in the field of Big Data analysis for infectious diseases. The proposed research as well as the training activities outlined in the proposal will successfully position Dr. Farhat for her first R01 and an independent career as a physician scientist. Environment: Dr. Farhat will perform the interdisciplinary work outlined in this proposal at the distinguished Harvard Departments of Global Health Social Medicine, Biostatistics, Evolutionary biology and the Institute for Quantitative Social Sciences. Dr. Farhat' mentorship team will include two world renowned leaders in the fields of infectious diseases and Big Data, Dr. Megan Murray and Dr. Gary King; and two rising stars in the fields of network Science and evolutionary Biology, Dr. JP Onnela and Dr. Michael Desai. Dr. Murray, the principal mentor on this proposal has mentored over 38 trainees, 9 of which have went on to have independent research careers, and 6 competed successfully for K awards. She is also PI on two recently awarded NIH/NIAID grants a CETR U19 and a TBRU U19 and has over 350 peer reviewed publications. To complement the expertise of her mentors Dr. Farhat will be advised by Dr. Christiani a practicing pulmonary and critical care physician and world renowned researcher in the field of lung and environmental genetics. She will also collaborate and consult with Dr. Merce Crosas, a data scientist, and Dr. Pardis Sabeti, a computational biologist. She will rotate through Dr. Soumya Raychaudhuri's bioinformatics laboratory to diversify her exposure to biomedical Big Data. In addition, she will receive formal training in evolutionary biology, Bayesian and mixed-model biostatistics, computer science, leadership skills and grant writing. The collaborative opportunities, intellectual environment and resources available to Dr. Farhat are outstanding. Research: Infectious diseases continue to be a major cause of morbidity and mortality. Despite the availability of effective antimicrobials, pathogens are successfully evolving new disease phenotypes that allow them to resist killing by these drugs or in other instances cause more severe disease manifestations or wider chains of transmission. Drug resistance (DR) is now common and some bacteria have even become resistant to multiple types or classes of antibiotics6. A key strategy in the fight against emerging pathogen phenotypes in infectious diseases is surveillance, and early personalized therapy to prevent transmission and propagation of these strains. The timely initiation of antibiotic therapy to which the pathogen is sensitive has been shown to be the key factor influencing treatment outcome for a diverse array of infections. Molecular tests that rely on the detection of microbial genetic mutations are particularly promising for surveillance and diagnosis of these pathogen phenotypes but rely on a comprehensive understanding of how mutations associate with these pathogen phenotypes. Currently there is an explosion of data on pathogen whole genome sequences (WGS) that is increasingly generated from clinical laboratories. Data on disease phenotype may also be available, but methods for the analysis and interpretation of these Big Data are lagging. Here I propose tools to aid in this analysis leveraging Big Data sets from Mycobacterium tuberculosis (MTB) and my prior work. Specifically I propose to (1) develop a web-based public interface to several analysis tools, including a statistical learning model that can predict the MTB DR phenotype from its genomic sequence, (2) to develop and study an MTB gene-gene network, based on WGS data, to improve our understanding of the effect of mutation-mutation interactions on the DR phenotype, and (3) study the performance of methods in current use for the association of genotype and phenotype in pathogens, and develop a generalizable power calculator for the best performing method.