Chemical space is big data: the number of drug-like molecules exceeds 10^60. Experimentally screening compound libraries for drug candidates is a time consuming and expensive process. Virtual screening is a cheaper, faster approach for identifying potential drug candidates. Existing virtual screening methods typically scale linearly with the size of the compound library. A virtual screen of a million compounds may take days and requires a significant investment in computational infrastructure. The lack of scalable virtual screening algorithms and the difficulty in accessing the infrastructure necessary to perform large-scale virtual screening severely limits the ability of researchers to explore the big data of chemical space. This research plan will develop scalable virtual screening algorithms that will enable virtual screening on an interactive time scale (seconds to minutes). Interactive algorithms support the integration of expert human insight and knowledge with computational methods and permit rapid hypothesis testing and exploration. These interactive algorithms will be deployed both as open-source software and as part of an online drug discovery collaboration environment. The online environment will provide immediate access to the big data infrastructure needed to enable rapid and collaborative online virtual screening. Algorithms for filtering compound libraries based on pharmacophore and molecular shape properties will be developed. Unlike current approaches, these algorithms will scale with the breadth and complexity of the query, not with the size of the compound database, enabling scalable and rapid filtering of billions of chemical structures. Efficient methods for ranking the filtered resuts that harness the computational power of modem graphics processing units will also be developed. Backed by the appropriate computational resources, these algorithms will support the screening of billions of chemical structures on an interactive time-scale. The interactive performance of the tools will support rapid hypothesis testing and experimentation, and users will be able to submit their own compound libraries for screening, encouraging cross-discipline collaboration. RELEVANCE (See instructions): The proposed research will result in novel algorithms and systems for the storage, retrieval, and analysis of chemical data to support the rapid identification of compounds of therapeutic interest. Successful application of these algorithms will reduce the cost and time of development of new drugs.