DESCRIPTION: Collaborative Drug Discovery, Inc. (CDD) will create a novel web-based software platform that enables scientists to work together effectively to discover and improve new drug leads by sharing computational predictions based on open-source descriptors and models, for the first time without needing to reveal underlying chemical structures and biodata. It will create the first practical system of bio computational analysis across distributed datasets with different owners, while respecting data privacy. By lowering this key barrier to collaboration the platform will accelerate the pre-clinical drug discovery pipeline. Research aimed at neglected diseases and orphan indications will especially benefit, because they often rely on the loosely affiliated efforts of academic investigators, non-profit foundations, government laboratories, and small biotechnology firms (extra-pharma entities). Such efforts typically lack not only the resources but also the integrated workflows of discovery projects conducted at large pharmaceutical companies (within which data can be shared freely across departments). The project will for the first time enable researchers focused on neglected diseases and orphan indications to effectively exploit bio computational tools such as virtual screening and ADME/Tox predictions, which are now considered to be standard and indispensible components of early discovery workflows within large pharma. It will also make it easier for these extra-pharma researchers to collaborate with large pharma and benefit from large pharma's significant investment accumulating large high-quality datasets. In Phase II of this SBIR project, CDD will: 1. Create a stand-alone platform, based entirely on open source technologies, that enables researchers to share models, share predictions from models, and create models from distributed, heterogeneous QSAR data - all without needing to divulge the underlying training sets. 2. Develop approaches that enable scientists who are not computational chemists to exploit the technology. A series of user interfaces will automate and intelligently guide the user to create or exploit models and assist the user to visualize domains of applicability, interpret results, and understand their limitations. The integrated platforms will enable scientists to seamlessly create, share and execute computational models leveraging private data vaults, with or without sharing the underlying training data. 3. Validate the platform by (a) developing a suite of at least five ADME/Tox and physicochemical property models based on open-source descriptors and data obtained from commercial ADME vendors, as well as public data from PubChem, ChEMBL and other sources, (b) securely making available a series of sophisticated pre- competitive ADME/Tox models provided by large pharmaceutical companies, and (c) demonstrating that col- laboratory can utilize the platform on their own (without relying on a computational chemist) to discover and advance TB drug leads with good ADME/Tox properties.