Computational techniques that build models to correctly assign chemical compounds to various classes of interests have extensive applications in pharmaceutical research and are used extensively at various phases during the drug development process. These techniques are used to solve a number of classification problems such as predicting whether or not a chemical compound has the desired biological activity, is toxic or non-toxic, and filtering out drug-like compounds from large compound libraries. The overall goal of this proposal is to develop substructure-based classification algorithms for chemical compound datasets. The key elements of these algorithms are that they (i) utilize highly efficient substructure discovery algorithms to mine the chemical compounds and discover all substructures that can be critical for the classification task, (ii) use multiple criteria to generate a set of substructure-based features that simultaneously simplify the compounds' representation while retaining and exposing the features that are responsible for the specific classification problem, and (iii) build predictive models by employing kernel-based methods that take into account the relationships between these substructures at different levels of granularity and complexity, as well as information provided by traditional descriptors.