PROJECT SUMMARY Collaborative Drug Discovery, Inc. (CDD) proposes to develop a novel approach based on deep learning neural networks to encode molecules into chemically rich vectors. We will first apply this representation to build more powerful computational models that can more accurately predict properties such as bioactivity, ADME/ Tox, and pharmacokinetics across libraries of molecular structures. The ultimate goal is to leverage this repre- sentation to generate novel compounds with better combinations of properties. Both of these capabilities will help scientists to accelerate discovery of new drugs broadly across many therapeutic areas. Scientists engaged in drug discovery research from academic laboratories to large pharmaceutical companies rely on computational QSAR models to predict pharmacologically relevant properties and obviate the need to perform expensive, time-consuming assays (many of which require animal studies) for every molecule of interest. Some properties (e.g. logP) can now be modeled with such high confidence that the models have replaced the need to perform the assays, but many other critical properties (e.g. solubility, ADME, PK, hERG) remain far from this goal. We expect that our proposed chemically rich vectors will significantly advance the state of the art beyond what can be achieved with conventional descriptors and fingerprints. Improved models will enable researchers to select lead candidate series more effectively, explore chemical space around leads to generate novel IP more efficiently, reduce failure rates for compounds advancing through the drug discovery pipeline, and accelerate the entire drug discovery process. These benefits will be realized broadly across most therapeutic areas. Our central innovation is a novel computational strategy: first develop a deep learning (DL) model optimized to best capture the essential structural and chemical features of molecules, starting from the most natural structural representation; then validate the DL model by applying it to improve QSAR modeling of pharmacological properties; and finally extend it to generate previously unknown molecules that have superior properties ? the so-called ?inverse QSAR? problem, which is the Holy Grail of computational medicinal chemistry. Others have unsuccessfully tried to leap directly to solve the inverse QSAR problem. We propose a more patient and methodical approach that will allow the neural network to perform self-supervised training to learn about chemical structures and properties from readily available, extremely large datasets, then transfer this learning to improve modeling; only after establishing this solid foundation do we intend to apply the models to attempt inverse QSAR. Prior attempts in this area have also relied on neural network architectures designed originally for language processing. We will design a new architecture, more akin to neural network architectures that have proven most successful at image classification, and optimize it to directly process the ?molecular graph? that represents the relationship of atoms and bonds in molecules.