The Veterans Health Information Systems and Technology Architecture (VistA) is an integrated system of software applications that directly supports patient care at Veterans Health Administration (VHA) healthcare facilities. To facilitate veteran care, VistA maintains a massive repository of patient-related data, including over 1.3 billion textual documents (e.g., progress notes, discharge summaries). The Computerized Patient Record System (CPRS), a front-end application that interfaces with the VistA data repository, allows clinicians to enter, review, and update information concerning all aspects of a veteran's care in their electronic health record (EHR). For veterans with complex and chronic diseases, thousands or tens of thousands of text- based progress notes may be associated with their EHR. Searching through this vast amount of textual data to find useful information can be an arduous task due to the lack of sophisticated search capabilities within CPRS. The VistA EHR system represents the cornerstone of clinical care in the VA. This pilot study is the first step in a program of research, where the ultimate goal is to make finding relevant information within a veteran's EHR easier for clinicians, thus improving processes of care and, potentially, patient outcomes. The purpose of the proposed study is to determine if information retrieval (IR) techniques found to be useful in searching large text-based data repositories such as the Internet or PubMed can be applied to progress notes from VistA. In addition, we will explore whether including information about clinically-relevant concepts from a medical ontology improves IR results. A total of four IR systems will be examined: (1) vector space model (baseline); (2) vector space model enhanced with ontology weights; (3) latent semantic indexing model; and (4) latent semantic indexing model enhanced with ontology weights. The SNOMED-CT ontology will be used with concepts weighted via their relative importance within the ontology by Google's PageRank algorithm. The four IR systems will be evaluated based on their ability to find progress notes relevant to a selected note; where relevance will be judged by the clinical co- investigators. The document collection to be searched will consist of all progress notes over a 17-month period from a random sample of 20 patients from the James A. Haley Veterans Medical Center (JAHVMC) who tested positive for methicillin-resistant Staphylococcus aureus (MRSA) and five who did not test positive. The association of MRSA infections with prolonged hospital stays and patients with chronic conditions presents a cohort of patients that are ideal for testing IR systems. The EHR of MRSA-positive patients are likely to contain large numbers of progress notes of a heterogeneous nature (e.g., physician notes, nursing notes, laboratory results). The large quantity and diverse types of notes associated with this complex condition will provide for an excellent test of the effectiveness of the proposed IR techniques. The IR systems will be evaluated using measures derived from precision and recall. The exact Wilcoxon Signed Rank test, a non-parameteric test, will be used to examine all-pair combinations of IR systems for each performance measure.