The long-term aims of this project are to further medical text processing methodologies in order to broaden the availability of patient data for automated clinical applications. Presently, the most widespread, convenient, and comprehensive means for medical personnel to report clinical information in is natural language. A procedure that processes narrative text to extract and codify the clinical information contained in the text will enable automated decision support, quality assurance, and research applications to have reliable access to a much broader range of patient data that is presently possible. Enhancing the capabilities of automated research, quality assurance, and decision support will have a significant effect on the quality of patient care. It has already been demonstrated that clinical data can be successfully mapped into structured forms. Some text processing systems that have proved to be effective are applicable only for limited text; others have more sophisticated language capabilities but are not precise or reliable enough for clinical purposes. A specific aim of this project is to integrate and enhance positive aspects of several different techniques (pattern matching, semantic-based, semantic and syntactic-based, and statistical) to create a robust and reliable processor that can be incrementally developed and extended within one uniform framework. Initially, the information to be extracted will be limited to that which is useful for a specific application: the decision support component of the Clinical Information System (CIS) at Columbia Presbyterian Medical Center (CMPC) This work is being done in conjunction with CPMC so that the effectiveness of this technique can be realistically evaluated in a clinical setting. Clinical data from radiology reports will be automatically codified and inserted into the clinical patient database at CPMC. This is possible because the schema of the relational patient clinical database is designed to accommodate the type of complex clinical information that is found in natural language. In order to integrate the output of the text processor with the automated applications using the data, a controlled vocabulary (with codes) for radiology will be developed and incorporated into the Medical Entities Dictionary (MED) at CPMC, which is an object oriented knowledge base of clinical terms. The output of the text processor will be in a form which is compatible with the definitions of terms in the MED. An automated procedure will be able to match the output forms with the MED forms of clinical terms in order to obtain precise codes for the data. The decision support application, as well as other computerized applications, will have reliable access to the clinical data in the patient database because they will only reference codes corresponding to terms in the MED.