ABSTRACT Many regions of the United States, particularly rural and frontier areas, lack the resources to proactively identify isolated infectious disease outbreaks. Scott County Indiana experienced an HIV outbreak in 2014- 2015, resulting in approximately 200 new HIV infections. If not for a vigilant Disease Intervention Specialist (DIS) who noticed an escalating number of HIV infections in Scott County over a brief period of time, the outbreak could have been much worse. An isolated outbreak in resource-limited settings such as Scott County underscores the need for more innovative, automated, and real-time HIV biosurveillance systems in non-urban areas. To date, digital HIV epidemiologic research has relied almost exclusively on Twitter; however, this approach is likely too restrictive and has not yet yielded a promising approach to predicting HIV outbreaks. More heterogeneous sources of data--in addition to social media--may more efficiently predict the arrival of HIV in a community with limited surveillance resources. The proposed health informatics research will analyze historical time series data collected from 2014 through 2016 to identify predictor variables that model the Scott County HIV outbreak. Data to be analyzed include: (1) emergency room (ER) admissions and discharges related to opioid use and soft tissue infections related to drug abuse; (2) HIV testing surveillance data; (3) HCV incidence data; (4) search engine inquires of relevant topics, such as Google or Bing searches for ?HIV testing? and ?Opana?; (5) law enforcement arrest records (particularly those related to opioid possession and distribution); and (6) electronic healthcare reimbursement data of HIV-related treatments (e.g., post-exposure prophylaxis). Automated data/text mining and machine learning techniques will also be applied to (7) social media data (i.e., Twitter tweets and Reddit forum posts) that make reference to HIV, Opana, substance use, and other terms to determine if trends in social media data could have predicted HIV's arrival in, and transmission throughout, Scott County. Using the diverse data sources listed above, our team will correlate the time series of key predictor variables to identify the data source (or sources) most predictive of a known HIV outbreak. If our team develops a health informatics approach and algorithm(s) identifying trends in social media and other electronic data indicating an imminent HIV outbreak, state and county health departments can use these ?signals? to increase the number of HIV testing and counseling sites in the affected area, health care providers can more aggressively screen for HIV/STI infection, syringe-service programs can be mobilized rapidly and targeted more efficiently, contact tracing activities can be initiated, and PEP and PrEP can be prescribed to those at risk for HIV infection in the geographic region of concern. This study will also examine feasibility issues related to the collection and analysis of electronic health information and social media data in HIV biosurveillance efforts, such as data accessibility, costs, and generalizability.