The broad long-term objectives of this proposal are to create and evaluate an infrastructure (GeneSeek) to permit searching across heterogeneous source databases (genomic and citation databases) for relevant information needed for curation of an existing database of clinical knowledge (GeneClinics). The Specific Aims are: 1) to use a novel, general purpose knowledge representation language to capture the schema of an existing database of clinical knowledge (GeneClinics genetic testing database), 2) to build a shared schema for mediating cross database queries, by extending the schema of GeneClinics and incorporating pertinent schema elements from other structured and semi-structured information sources, 3) to create and test interfaces to the targeted genetic information sources (databases and other structured information) from the shared query mediation schema, 4) to adapt the existing Tukwila data integration system to implement cross database query planning, query execution, and query result aggregation in the context of our shared query mediation schema and the multiple structured (genetic) information sources to create the GeneSeek data integration system, 5) to evaluate the performance of the Tukwila based GeneSeek data integration system and the shared data schema for precision and recall in finding relevant information for curation of a clinical database (GeneClinics genetic testing database). The broad health relatedness of the project is that data integration tools are needed to help clinicians apply the ever- growing body of medical information to patient care. The tools are needed by curators of databases of medical knowledge as well as by the care providers themselves. Nowhere is the growth in information more apparent than in the Human Genome project thus the choice of genetics as a domain to test this data integration system. The specific genetics database whose curation the GeneSeek system will be evaluated against the GeneClinics database. If successful these data integration systems could be more broadly applied to other domains in biomedicine. The research design is to apply recent developments in data integration from the artificial intelligence and database areas of computer science to a real world clinical genetics data integration problem to evaluate the applicability of this system to biomedical information retrieval tasks. The methods are to expand an existing collaboration between the current GeneClinics content and informatics teams and investigators in the Department of Computer science to: 1) enhance the Tukwila data integration architecture and its related CARIN knowledge representation language, and 2) to use these tools and the existing GeneClinics data model to implement and evaluate this data integration system in the specific domain of medical genetics.