User-friendly, intelligent, computer-based data mining tools are critical to fully realizing biomedical and toxicology researchers' effectiveness in uncovering knowledge contained in biomedical and toxicology databases and maximizing cross-scientific benefits. For example, continuing scientific advances are resulting in massive accumulations of macromolecular sequence data, most of which can not be effectively accessed by researchers. Further, toxicogenomic databases lack the ontology structures provided by the Gene Ontology making data searches laborious. This research sets forth innovative approaches designed to allow researchers to search across biomedical gene and toxicology ontology domains in free text using semantic similarity techniques. It will enhance cross-scale scientific discovery and facilitate effectively managing the exponentially growing research/clinical experience knowledge base. In Phase I we will demonstrate the feasibility of developing an automatic weighted cross-referencing tool for Gene Ontologies (molecular functions, biological processes, cellular components) using unsupervised learning techniques; illustrate cross-scale capabilities via a self-developed retrieval algorithm for discovering protein families; and building upon Phase I lessons learned and in-house experience, structure a toxicogenomic ontology strategy that facilitates cross-referencing Gene Ontology and toxicology databases. Phase II research will refine and expand the Phase I techniques to enable productive searches across all Gene Ontology domains and to develop prototype toxicogenomic ontologies. In Phase III the search tools will be brought to commercial readiness using traditional funding sources.