Project Summary and Abstract Narratives of electronic health records (EHRs) contain useful information that is difficult to automatically extract, index, search, or interpret. Natural language processing (NLP) technologies can extract this information and convert it in to a structured format that is more readily accessible by computerized systems. However, the development of NLP systems is contingent on access to relevant data and EHRs are notoriously difficult to obtain because of privacy reasons. Despite the recent efforts to de-identify and release narrative EHRs for research, these data are still very rare. As a result, clinical NLP, as a field has lagged behind. To address this problem, since 2006, we organized thirteen shared tasks, accompanied with workshops and journal publications. Twelve of these shared tasks have focused on the development of clinical NLP systems and the remaining one on the usability of these systems. We have covered both depth and breadth in terms of shared tasks, preparing tasks that study cutting-edge NLP problems on a variety of EHR data from multiple institutions. Our shared tasks are the longest running series of clinical NLP shared tasks, with ever growing EHR data sets, tasks, and participation. Our most popular three data sets have been cited 495 (2010 data), 284 (2006 de-id data), and 274 (2009 data) times, respectively, representing hundreds of articles that have come out of these three data sets alone. Our goal in this proposal is to continue the efforts we started in 2006 under i2b2 shared task challenges (i2b2, NIH NLM U54LM008748, PI: Kohane and R13 LM011411, PI: Uzuner) to de-identify EHRs, annotate them with gold- standard annotations for clinical NLP tasks, and release them to the research community for the development and head-to-head comparison of clinical NLP systems, for the advancement of the state of the art. Continuing our efforts under National NLP Clinical Challenges (n2c2) based at the Health Data Science program of the newly established Department of Biomedical Informatics at Harvard Medical School, we aim to form partnerships with the community to grow the shared task efforts in several ways: (1) grow the available de-identified EHR data sets through partnerships that can contribute to the volume and variety of the data, and (2) grow the available gold-standard annotations in terms of depth and breadth of NLP tasks. Given these aims and partnerships, we plan to hold a series of shared tasks. We will complement these shared tasks with workshops that meet in conjunction with the Fall Symposium of the American Medical Informatics Association and with journal special issues so that advancement of the state of the art can be sped up and future generations can build on the past.