Patients with non-small cell lung carcinoma (NSCLC) have a poor prognosis. Our goal is to develop methods to translate gene expression profiles into knowledge that will lead to the identification of molecular targets for prediction and therapy in these patients. Current clinical (staging) models for prediction of survival in these patients with small, localized tumors (Stage I) fails to identify the subset that will die less than 2.5 years after "curative" surgery. Our preliminary research with rule-based models using Stage I lung adenocarcinoma patients demonstrates meaningful results for prediction of survival with models that combine gene expression data with clinical data. Our primary hypothesis is: Clinical and gene expression attributes combine in rule-based models to accurately predict patient outcomes (time to death or recurrence). We propose to test this hypothesis for patients with resectable NSCLC. The Specific Aims are to: 1. Test the accuracy of the rule-based models as a function of the data type. We will use several established algorithms for generating decision rule models. Leave-one-out-cross-validation (LOOCV) will be used to evaluate model accuracy, complexity, and stability. The "gold standard" for performance comparison is the AJCC staging model. 2. Compare the model performance for rule-based induction to other algorithms. We will use support vector machines and artificial neural networks. We will address the role of feature selection in model performance. We will also confirm the significance of model variables using a standard statistical metric, partial least squares. We will also experimentally evaluate (random and adaptive) subsampling methods (bagging and boosting, respectively), which vote multiple rule models produced by one rule-based system, C4.5. 3. Research rule induction to hypothesize causal mechanisms in survival. We will develop a novel extension to rule induction that we call backward-chaining rule induction, a form of "semi-supervised" learning, which we expect to better focus the search for variable interactions to "interesting" rules (i.e., those with mechanistic explanations), than purely unsupervised approaches. Knowledge discovery will be evaluated based on domain expertise in the molecular basis of cancer.