DESCRIPTION (Taken from application abstract): We propose to develop automated techniques to facilitate classification and pattern recognition in biomedical data sets. These techniques will involve development of novel neural network architectures, as well as formulation of principles governing their creation and explanation of results. Specifically, as a solution to the problem of recognizing infrequent categories, we will develop hierarchical and sequential systems of feedforward neural networks that make use of information such as (a) prior knowledge of the domain, and/or (b) natural clusters defined by clustering or unsupervised learning methods to develop intermediate classification goals and utilize a divide-and-conquer approach to complex classification problems. Additionally, we will develop generic tools for pre-processing input data by making transformations of original data, reducing dimensionality, and producing training and test sets suitable for cross-validation and bootstrap. We will build tools for evaluating results that measure calibration, resolution, importance of variables, and comparisons between different models. Furthermore, we will develop standardized interfaces for certain existing classification models. We will use a component-based architecture to build our neural network and write interfaces to existing classification models (e.g., regression trees, logistic regression models) so that they can be interchanged in a user-friendly manner. We will use our preprocessing modules to prepare data to be entered in a variety of classification models. The results will be evaluated in isolation, and later combined to test the hypothesis that the combined system performs better in real biomedical data sets in terms of calibration, resolution, and explanatory power. This research will (a) quantify improvement in performance when a classification problem is broken down into subproblems in a systematic way, (b) quantify the advantages of combining different types of classifiers, create a library of reusable neural network classification models, data pre-processing, and evaluation tools that use standardized interfaces, and (d) foster dissemination of classification models and the use of pre-processing and evaluation tools by making them available to other researchers through the World-Wide-Web. We will test four hypotheses: (1) Combinations of different modalities of classifiers perform significantly better than isolated models. (2) Hierarchical and sequential neural networks perform better than standard neural networks. (3) Unsupervised models can decompose a problem for hierarchical or sequential neural networks better than models that use prior knowledge. (4) It is possible to build a Classification Tool Kit composed of data pre-processing modules, classification models, and evaluation modules in which components are independent, reusable, and interchangeable.