The availability of large volume of EHRs enhances the possibility for using them for health services, EBM and clinical research. However such functionality is currently limited to narrow areas of clinical practice, as relation detections between medical events among unstructured EHRs still pose a big challenge, consequently leading to the inaccuracy of patient cohort identification, especially for cross-institutional environment. Preliminary work by the PI has shown that topic modeling can discover data semantics, which can then be employed as significant cues for diverse relation detections among EHRs. Among them, co-referring, temporal relations and domain semantics are intertwined and positively correlated. Up to now, not much research is done to combine the three relations to build a better patient cohort identification system. Therefore, the PI proposes to develop a relation detection framework for EHRs empowered with topic modeling for more accurate patient cohort identification. In the mentored phase, the PI will implement the relation detection framework under the guidance of my mentor team and will make them available in open-source so that they can be adapted for deployment at other institutions (aim 1 - K99). In the independent phase, the PI will research methods to facilitate rapid development, deployment and cross-institutional portability of similar systems. Specifically, the PI will develo a hybrid design with ICD-9, RxNorm and MeSH ontologies for the data semantics discoveries from EHRs and MedLINE respectively and investigate categorization of data semantics aligning with medical ontologies (aim 2 - R00). To enable other researchers to reuse the developed methodologies and software resources and more importantly to make corrections or adjustments on data semantics, a toolkit will be developed that will support the construction and deployment of similar systems (aim 3 - R00). The independent phase will be in collaboration with both UTHealth and University of Maryland. The PI's career goal is to become a scientific leader in clinical informatics with a focus on relation detections among EHRs for efficient patient cohort identification. The PI has strong background in computational linguistics and rich experiences in medical clinical records processing and analyses, and will receive mentoring from Drs. Hongfang Liu, Christopher Chute, and Terry Therneau, who have complimentary areas of expertise. The mentored phase will be in Mayo Clinic Rochester where the PI will undertake courses in US healthcare system, health system engineering, clinical statistics and clinical epidemiology and will get mentored training in health informatics which is what he needs to continue to strengthen since he didn't get regular training in his PhD education. In the independent R00 phase, the PI will strive for making independent scientific contributions to the use of informatics for healthcare via the implementation of Aims 2 and 3 and via the independent collaborations internal and externally. Completion of the proposed work will enable the PI to seek further funding for piloting clinical deployment of the developed systems, measuring their clinical impact, and for scaling the approach to other clinical domains and institutions. The career grant will enable the PI to establish himself as an independent investigator and to make significant contributions towards advancing the construction of medical knowledge systems and clinical practices as well as clinical research.