BACKGROUND Ischemic stroke is an important cause of morbidity, mortality, and cost within the VHA. An accurate assessment of who has had a stroke is a vital part of many ongoing VHA research and quality improvement efforts. However, current approaches to ascertaining this outcome have important limitations. Chart review, while accurate, is extremely resource intensive, thus limiting the size of projects relying upon it. Approaches using administrative data (i.e. ICD-9 codes) are less resource intensive, but less accurate. We propose a novel approach to ascertaining ischemic strokes from VHA automated data using natural language processing and machine learning. Natural language processing and machine learning use artificial intelligence principles to "teach" a computer how to extract clinical data from free-text notes with acceptable sensitivity and specificity. OBJECTIVES Aim 1: Develop and evaluate an automated approach to the extraction of incident ischemic stroke from the free text of VHA medical records. Aim 2: Deploy this empirically evaluated approach upon an existing database of approximately 150,000 patients who have received warfarin (an oral anticoagulant) from the VHA, and are thus at elevated risk for ischemic stroke. METHODS Using chart review as our gold standard, we will develop algorithms to extract the ischemic stroke outcome from VHA medical records. We will begin by annotating clinical records, highlighting information that can inform the algorithm regarding which patients did or did not have a stroke. We will then divide the charts into a training set and a validation set. The training set will be used to inform the design of an algorithm for automated detection of ischemic stroke. The developed algorithm will be evaluated in terms of recall, precision, and harmonic mean (F-measure) using the validation set. These performance characteristics are analogous to sensitivity, positive predictive value, and overall accuracy. We will then deploy this validated algorithm upon an existing database of patients known to be at elevated risk of stroke. ANTICIPATED IMPACT The proposed project will greatly improve ascertainment of ischemic stroke from the VHA electronic health record, combining the accuracy usually associated with chart review with the low cost and large datasets usually associated with ICD-9 based approaches. Algorithms developed through this project will be made available to any VHA entity wishing to use them for research or quality improvement. This has the potential to facilitate VHA research and quality improvement efforts relating to the prevention and management of ischemic stroke.