Project Summary 1 More than 4,000 systematic reviews are performed each year in the fields of environmental health and evidence-based 2 medicine, with each review requiring, on average, between six months to one year of effort to complete. One of the most 3 time consuming and repetitive aspects of this endeavor involves extraction of detailed information from a large number 4 of scientific documents. The specific data items extracted differ among disciplines, but within a given scientific domain, 5 certain data points are extracted repeatedly for each review conducted. Research on use of natural language processing 6 (NLP) for extracting individual data elements has shown that it has the potential to greatly reduce the laborious, time 7 intensive, and repetitive nature of this step. However, there is currently no integrated, automatic data extraction platform 8 that meets the needs of the systematic review community. We propose a web-based data extraction software platform 9 specifically designed for usage in the domain of systematic review. By combining multiple state-of-the-art data extraction 10 methods utilizing NLP, text mining and machine learning, into a single, unified user interface, we will thereby empower 11 the end-user with a powerful and novel tool for automating an otherwise arduous task. 12 The research we propose encompasses three specific aims: (1) develop new data extraction models using deep learning 13 and a new technique called ?data programming?; (2) develop a web-based platform to semi-automate the process; (3) 14 design protocols and standards for packaging extraction models as software components and integrating work done by 15 other research groups and vendors. In the first aim, we will contribute novel data extraction modules designed and 16 trained specifically to extract data elements of interest to those conducting systematic reviews in the domain of 17 environmental health. For this research, we will employ state-of-the-art machine learning, NLP and text mining 18 methodologies to train and evaluate several novel extraction components. In our second aim, we will develop a web- 19 based workbench which will allow users to upload scientific documents for automated data extraction. Our system will 20 also be designed to allow for integration of data extraction approaches (components) from other research groups, thus 21 enabling end users to choose from a wide variety of advanced data extraction methodologies within one unified and 22 intuitive software environment. In our third aim, we will develop new protocols to standardize the inputs and outputs 23 for data extraction components. The resulting interface, which will enable seamless integration of third party extraction 24 components into the workbench, will also facilitate the incorporation of feedback from users such that extraction 25 components can be continuously improved based on real-time data. 26 Our overarching goal is to translate emerging semi-automated extraction technologies out of the lab and into practical 27 software and to bring to market both the software itself as well as several premium data extraction components. The 28 results of the research conducted for Aims 1-3 represent the first step in this direction and will provide the foundation for 29 future developments. These result will take us one step closer to the dream of creating ?living systematic reviews,? which 30 are maintained using automated or semi-automated methods and updated regularly as new evidence becomes available.