A mathematical or computational model of infectious disease transmission represents the process of how an infection spreads from one person to another. Such models have a long history within infectious disease epidemiology, and are useful tools for giving insight into the dynamics of epidemics and for evaluating the potential effect of control methods. The overall objective of this project is to substantially improve the methods by which models of infectious diseases transmission are calibrated against biological and disease surveillance data. This will both improve the utility of models as tools for analyzing data on infectious disease outbreaks (for instance to provide more rapid and reliable estimates of how transmissible and lethal a new virus is to public health agencies) and also improve the reliability of models as tools for predicting the likely effect of different interventions (such as vaccines or case isolation) to help policy makers make more informed decisions about control policies. As with many areas of biology and medicine, the data landscape for infectious diseases modeling is changing rapidly. Larger and more complex datasets are becoming available that cover many different aspects of the interaction between a pathogen and the human population: clinical episode data, genetic data about fast-evolving pathogens; animal-model transmission data and community-based representative serological data. The specific aims of our project are to: (a) develop new machine-learning based methods to discover interesting patterns in complex datasets related to the transmission of infectious disease, so as to better specify subsequent mechanistic mathematical or computational models; (b) derive new approaches for using more than one type of data simultaneously to calibrate transmission models and (c) derive new methods of parameter estimation for simulations which model the spatial spread of infection or model both the transmission and genetic evolution of a pathogen. We will achieve these aims in the applied context of research on three key infections: emerging infectious diseases (such as MERS-CoV - the novel coronavirus currently spreading in the Middle East), influenza and Streptococcus pneumonia (a major bacterial pathogen). Examples of the scientific questions we will address that cannot be answered with current methods are: (i) how many unobserved cases of MERS-CoV have occurred so far (to be answered using data on case clusters data, the spatial distribution of cases and viral genetic sequences)? (ii) how many people in different age groups are infected with influenza each year and how does their immune system respond to infection (to be answered using data on case incidence and serological testing of the population)? (iii) how much is vaccination coupled with prescribing practices influencing the emergence of resistant strains of pneumococcus (to be addressed with data on antibiotic and vaccine use, case incidence and bacterial strain frequency)?