The adoption of Electronic Health Record (EHR) systems is growing at a fast pace in the U.S., and this growth results in very large quantities of patient clinical data becoming available in electronic format, with tremendous potentials, but also equally growing concern for patient confidentiality breaches. Secondary use of clinical data is essential to fulfill the potentials for high quality healthcare, improved healthcare management, and effective clinical research. NIH expects that larger research projects share their research data in a way that protects the confidentiality of the research subjects. De-identification of patient data has been proposed as a solution to both facilitate secondary uses of clinical data, and protect patient data confidentiality. The majority of clinical data found in the EHR is represented as narrative text clinical notes, and de-identification of clinical text is a tedious ad costly manual endeavor. Automated approaches based on Natural Language Processing have been implemented and evaluated, allowing for much faster de- identification than manual approaches. Clinacuity, Inc. proposes to develop a new system to automatically de-identify clinical notes found in the EHR, to then improve the availability of clinical text for secondary uses, as well as ameliorate the protection of patient data confidentiality. To establish the merit and feasibility of such a system, we will work on the following objectives: 1) create a reference standard for training and testing the text de-identification application, a reference standard that will include a random sample of clinical narratives with protected health information annotated by domain experts; 2) develop a prototype to automatically de-identify clinical text in near real-time, a prototype that will implement a novel stepwise hybrid approach to maximize sensitivity first (our priority for de-identification), and then filter out false positives to enhance positive predictive value, and also replace PHI with realistic substitutes for improved protection; and 3) test the prototype with the aforementioned reference standard, using a cross-validation approach for training and testing, and also train and test the prototype with a reference standard from another healthcare organization. Commercial application: To strengthen patient information confidentiality protection, the HITECH Act heightened financial penalties incurred for breaches of PHI, even introducing criminal penalties. These new severe consequences for violation of patient information confidentiality render protection requirements even more obvious, and automatic high-accuracy clinical text de-identification, as offered by the system Clinacuity proposes, will strongly help healthcare and clinical research organizations avoid such consequences. This system has potential commercial applications in clinical research and in healthcare settings. It will improve access to richer, more detailed, and more accurate clinical data (in clinical text) for clinical researchers. It will also ease research data sharing, and help healthcare organizations protect patient data confidentiality.