The broad long-term objectives of this proposal are to collaborate with a group of biology researchers with real world needs to develop and distribute a general-purpose system (BioMediator) to permit integration and analysis of diverse types of biologic data. BioMediator will combine information from a variety of different public and private sources (e.g. experimental data) to help answer biologic questions. BioMediator builds on the foundations laid by the currently funded GeneSeek data integration system. The GeneSeek system was originally developed to query only public domain data sources (both structured and semi-structured) to assist in the curation of the GeneClinics genetic testing knowledge base. The specific aims leading to the development of the BioMediator system are: 1) Interface to additional public domain biological data sources (e.g. pathway databases, function databases). 2) Incorporate access to private databases of experimental results (e.g. proteomics and expression array data). 3) Extend model to include analytic tools operating across distributed biological data sources (e.g. across a set of both proteomic and expression array data). 4) Evolve centralized BioMediator system into a model peer to peer data sharing and analysis system. 5) Distribute and maintain BioMediator production software as a resource for the biological community. The health relatedness of the project is that biologists seeking to understand the molecular basis of human health and disease are struggling with large and increasing volumes of diverse data (mutation, expression array, proteomic) that need to be brought together (integrated) and analyzed in order to develop and test hypotheses about disease mechanisms and normal physiology. The research design is to develop BioMediator by combining and leverage recent developments in a) the domain of open source analytic tools for biologic data and b) ongoing theoretical and applied research by members of the current GeneSeek research team on both general purpose and biologic data integration systems. The methods are: a) to use an iterative rapid prototyping software development model evaluated in a real-world test bed and b) to expand the existing GeneSeek research team (with expertise in informatics, computer science, and software development) to include biological expertise (four biologists forming a biology working group) and biostatistics expertise. The goal is to ensure the BioMediator system 1) meets the needs of a group of end users acquiring, integrating and analyzing diverse biologic data sets, 2) does so in a scaleable and expandable manner drawing on the latest theoretical developments in data analysis and integration.