Lung cancer (LC) is the most deadly cancer worldwide, causing up to 3 million deaths annually. The two major types of LC are small-cell lung cancer (SCLC) and non- small-cell lung cancer (NSCLC). These two types differ by incidence, underlying biology, progression, and prognosis. The incidence of SCLC is lower than that of NSCLC, but it is more aggressive, so the 5-year death rate for SCLC is comparable to that for NSCLC. Although both types are associated with smoking, that association is stronger for SCLC;nonetheless, very few studies have specifically focused on SCLC. It has been demonstrated that tobacco-smoke exposure (TSE) alters the expression of numerous genes in bronchial epithelial cells. We propose a two-step approach to identify and validate candidate genes whose altered expression causes SCLC. In the first step, we will combine bioinformatics and statistical methods to predict genes whose altered expression can cause SCLC. In the second step, we will validate the top 10 candidate genes (5 for SCLC and 5 for NSCLC) using a case-control design. We propose following specific aims: 1. To predict and identify genes whose expression is altered by TSE, leading to SCLC and NSCLC (adenocarcinomas) The candidate genes will be identified by analysis of gene expression data in the Smoking-Induced Epithelial Gene Expression (SIEGE) database and Gene Expression Omnibus (GEO) database. 2. To validate the top 10 candidate genes (5 for SCLC and 5 for NSCLC) using a case-control association study Polymorphisms in the promoter regions of the top candidate genes, nonsynonymous single-nucleotide polymorphisms (SNPs), and SNPs located in the sites important for splicing will be used to evaluate whether they confer risk for lung cancer. The proposed study will thus identify genes whose impaired expression by TSE causes SCLC. The identification of these genes will improve our ability not only to diagnose and treat SCLC early but also to prevent it.