The long-term aim of this project is to use natural language processing (NLP) to build a high throughput tool for facilitating cancer research by automatically extracting and organizing clinical and genetic information from the Electronic Medical Record (EMR) and from journal articles. Our research involves advanced NLP techniques to: 1) enable the mining of phenotypic and genotypic data in the EMR; 2) automatically amass knowledge concerned with cancer and biomolecular relationships from journals; 3) develop a WEB-enabled visualization tool for researchers that will present diverse views of the knowledge; and 4) develop an Infrastructure that will link to the clinical data warehouse at New York Presbyterian Hospital (NYPH) and to GeneWays, a related project that allows researchers to visualize pathways. More specifically, MedLEE (the NLP system we developed that extracts and encodes clinical and environmental information from the EMR) will be extended to extract genetic information contained in the EMR; subsequently, twelve years of patient reports will be processed and the extracted data added to the warehouse. In addition, a new system, PhenoGenes, will be developed based on MedLEE and GeneWays (which contains another NLP system we developed that extracts and codifies biomolecular relations from journal articles). PhenoGenes will capture biomolecular interactions directly associated with the treatment, diagnosis, and prognosis of cancer. It will also generate an XML knowledge base that will integrate and organize the information that will be captured, and a Web-enabled tool that will allow users to browse and view the knowledge clustered according to different orientations (e.g. gene, disease, tissue, interaction, etc.). The knowledge base will be linked to the GeneWays system, so that relevant pathways can be visualized. MedLEE is utilized operationally at NYPH. It also has been demonstrated that both NLP systems are highly effective. This current project builds upon our experience and success with these systems. The availability of related compatible clinical and biomolecular NLP systems, provide an exceptional opportunity to pave the way for capture, integration and organization of phenotypic and genotypic data and knowledge that will be used to radically improve patient care.