The research objective of the project is to extract and utilize information from keyword-frequency data obtained from Online Social Networks (OSNs) such as Facebook, MySpace and Twitter, in order to provide timely prediction of the emergence and spread of an influenza epidemic. Reports of Influenza-Like Illness (ILI) cases by the Centers for Disease Control and Prevention (CDC), though authoritative, typically have a 1-2 weeks delay due to the largely manual process. Public health authorities need the earliest possible warning to ensure effective intervention, and therefore more efficient and timely methods of estimating influenza incidence are urgently needed. More and more people are using OSNs everyday and talking about their daily activities and events. When they are sick, announcements such as "I'm coughing and sneezing and feeling sick" are often posted. Although these data would be "noisy" individually, in the sense that not everyone who sneezes and coughs is infected with an influenza virus, in aggregate, they provide a previously untapped data source that can be transformed into a large scale picture of the underlying epidemic pattern in time and space. Such a picture will have a very short time lag, since OSN data are almost concurrent with what's going on and can be obtained almost in real time. Already, Google Flu Trends uses web search terms such as "influenza complication" and "cold remedy" collectively to predict the onset of the annual influenza season 1-2 weeks ahead of the CDC ILI data (and 3 or more weeks ahead of the reports). It is thus expected that the "I have a cold" status and "get well soon" messages exchanged between OSN users and their friends may provide even earlier and more robust prediction. Such data will be available initially from scanning public OSN profiles and blogs. In the near future, they will be available from Facebook's new "Facebook Lexicon" interface, which enables the public to issue customized queries of its aggregated data. This project will develop an automated system to aggregate and transform OSN data into information for the early detection and prediction of temporal and geographic influenza incidence. It will be capable of (1) novelty detection, to focus on detecting the transition from a "normal" baseline situation to a pandemic, (2) ILI prediction using Autoregressive Moving Average (ARMA) time series models, to provide a valuable "preview" of possible scenarios, and (3) nonlinear filtering, to enhance the predictive power of mathematical models of influenza that typically do not have precise parameters. Such a system will become a valuable tool for public health authorities, for example by becoming part of the CDC's BioSense program. It also offers the commercial potential for supply chain prediction, and therefore is likely to be adopted by OSN hosts such as Facebook, further improving influenza epidemic prediction and helping to maintain public health and safety. PUBLIC HEALTH RELEVANCE: The objective of the project is to provide timely prediction of the emergence and spread of an influenza epidemic, by utilizing data from Online Social Networks such as Facebook, MySpace and Twitter. The data can include relative frequencies of postings such as "I'm coughing and sneezing and feeling sick," and when aggregated, can be transformed into a large scale picture of the underlying epidemic pattern in time and space. An auto- mated system will be developed using several data mining technologies combined with mathematical models of influenza, and it will run in near real time to give public health authorities an early warning to ensure effective intervention.