The overall objective of this research is to develop computer techniques for extracting information, rather than titles or citations, from the medical literature and from computerized medical data stores written in natural language (e.g. hospital records). The methods, which have been estabished in previous work of the principal investigator, do not depend on prior knowledge of the subject matter, but use procedures and computer programs based on formal linguistic analysis. The major components are (1) a syntactic analysis program which is equipped with a comprehensive grammar of English; (2) a clustering program which operates on the output of the syntactic analyzer, and groups words into semantic classes based on the similarity of their distributions vis a vis other words in the syntactically analyzed sentences; (3) the method of sublanguage grammars, which uses (1) and (2) and a manual linguistic analysis to state information structures applicable to texts in a particular subject matter area; (4) a formatting program which maps texts in the given subject matter area into the information structures developed in (3). While the individual components had been developed previously, the present investigation is the first attempt to apply the technique as a whole (called "information formatting") to a corpus of medical narrative, and to have the technique evaluated for its health relevance by medical users. BIBLIOGRAPHIC REFERENCES: Anderson, B., I.D.J. Bross and N. Sager, Grammatical Compression in Notes and Records, Proceedings of the 13th Annual Meeting of the Association of Computational Linguistics, American Journal of Computational Linguistics, vol. 4, 1975. L. Hirschman, R. Grishman and N. Sager, From Text to Structured Information--Automatic Processing of Medical Reports, Proceedings of the National Computer Conference, AFIPS Press, Montvale, N.J. 1976.