Cancer Problem: Cessation of smoking substantially decreases the risk of lung cancer among smokers, but the benefits of programs aimed at reducing smoking are sometimes difficult to assess. Self-reported smoking behavior is often unreliable; independent biochemical verification is necessary to validate survey data and monitor compliance. Cancer Control Hypothesis: Our basic hypothesis is that the exhaled breath of smokers contains volatile organic compounds (VOCs) that can be used to determine smoker status. A recent study in which we participated has found that benzene concentration in the exhaled breath of smokers is substantially higher than that of nonsmokers. The data were generated as part of the TEAM (Total Exposure Assessment Methodology) study, which measured human exposure to VOCs in the breath of a group of subjects in New Jersey. Although data were obtained on 100-200 volatile constituents in the breath of each participant, the TEAM study focused on only 20 preselected compounds (including benzene), and ignored the remaining measured volatiles in the breath. We plan to evaluate a subset of the complete data base (69 samples) to test our hypothesis. Proposed Cancer Control Intervention: The ultimate objective of this research program is to develop a noninvasive detector to validate compliance with smoking intervention strategies, using volatile organic constituents in expired air. Cancer Control Research Phase: In the exploratory study proposed here, the raw GC/MS data will be conditioned first using a set of special computer algorithms. Thereafter, all the variables (GC/MS peaks) will be screened both individually and in combination for statistical significance using one-way analysis of variance and stepwise discriminant function analysis. If a variable or linear combination of variables appears to be significant, we will conclude that it provides a basis for distinguishing between smokers and nonsmokers. Normally, a study designed especially to obtain such information would be very costly. The fact that the data base is already available is clearly a great advantage, since the major costs associated with generating the raw data have already been incurred.