The main goal of this project is to develop, validate, and deliver efficient computational tools for rapid and reliable prediction of biological activity and/or related pharmacological or pharmaceutical properties of drug-like molecules. We plan to develop statistically significant and robust Quantitative Structure-Activity Relationships(QSAR) methodologies, which incorporate rigorous validation procedures and lead to models with a high predictive power and practical utility. The QSAR approaches are based on the SAR premise: similarity or diversity of chemical structures determines similarity or diversity of their biological action. We argue that formal assessment of chemical similarity or diversity in terms of various chemically relevant molecular descriptors may be inadequate in case of QSAR modeling because the relative association of descriptors with target property (biological activity) is not known a priori. We suggest that the similarity (diversity) should be evaluated in the context of the target property and employ objective similarity and diversity functions to achieve biologically meaningful clustering of compounds in the descriptor space. Therefore, our methodologies employ variable selection procedures aimed at identifying descriptors that are most relevant with respect to the target property. Rigorous model validation ensures the highest hit rates (i.e., the highest content of biologically active compounds in the top scoring compounds) when predictive QSAR models are ultimately applied to screening chemical databases or virtual libraries. Four major areas of concentration in this proposal are:[unreadable] development of novel, mainly non-linear QSAR methods, such as k nearest neighbor (kNN) QSAR approach, applicable to very large datasets. The emphasis will be on the development of novel descriptors of chemical structure,and the efficiency, automation and statistical robustness of the underlying methodologies.[unreadable] development of efficient and objective QSAR model validation methodologies, which maximize the predictive ability of the models and ensure their accuracy in rational design of focused libraries and database mining applications.[unreadable] application of validated QSAR modeling methods to various datasets of pharmacological or pharmaceutical importance; this part also includes the development of new approaches for effective data analysis and model interpretation based on advanced algorithms for data compression and mapping from high- to low dimensional descriptor space.[unreadable] implementation of all modeling methods developed and validated in the course of this work in the publicly accessible UNC QSAR web server. Successful implementation of this proposal is expected to afford highly automated and predictive QSAR modeling tools, which will benefit a broad research community working in the area of drug design and discovery.