The eTfor2 project will develop and evaluate open-source programs and knowledge representations to better characterize patients for translational and clinical research studies. The project addresses National Library of Medicine (NLM) RFA initiatives for: (a) information & knowledge processing, including natural language processing and text summarization, (b) approaches for linking phenomic and genomic information, and (c) integration of information from heterogeneous sources. Translational studies correlate clinical patient descriptors (phenome) with results of genomic investigations, e.g., genome-wide association studies (GWAS). Standard methods for defining phenotypes require costly, labor-intensive cohort enrollments to identify patients with diseases and appropriate controls. Recently, translational and clinical researchers have used electronic medical record (EMR) data as an alternative to identifying patient characteristics. However, EMR case extraction requires substantial manual review and tuning for case selection, due to the inaccuracies inherent in ICD9 billing codes. While relevant and useful natural language processing (NLP) approaches to facilitate EMR text extraction have proliferated, the target patient descriptors these approaches employ typically remain non-standard and locally defined, and vary from disease to disease, project to project and institution to institution. At best, such NLP applications use standard terminology descriptors such as SNOMED-CT as EMR extraction targets. Yet, there is no generally utilized standard knowledge base that links such extractable descriptors to an academic-quality knowledge source detailing what findings have been reliably reported to occur in each disease. To facilitate translational and clinical research, the eTfor2 project will make available an open-source, evidence-based, electronic clinical knowledge base (KB) and related NLP tools enabling researchers at any site to extract a standard target set of EMR-based phenomic descriptors at both the finding and disease levels. It will further include diagnostic decision support logic to confirm the degree of support for patients' diagnoses in their EMR records. The eTfor2 project will decrease effort required to harvest EMR patient descriptors for clinical and translational studies, and enable new translational work that identifies genomic associations at both finding and disease levels. The eTfor2 resources should improve the quality and cross-institutional validity of EMR-based translational and clinical studies.