Great progress has been made in the last two decades in our understanding of the use of observer performance studies in the evaluation of diagnostic systems performance, both in absolute and in relative terms. Analytical approaches that account for variability of cases, readers, and modes are rapidly gaining credibility; hence, their use is becoming common not only in scientific investigations, but also in industry demonstrations of system utility as a condition for regulatory approvals. ROC-type methodology for systems evaluations and comparisons has become an extremely versatile tool that can address a wide range of scientific and clinical questions in the laboratory environment. The more important question of interest in all of these studies is not the ability to generalize to cases, readers, abnormalities and modalities under the study/I laboratory conditions, but rather to enable valid inferences on the potential impact of different technologies orI practices on the actual clinical environment. Although intuitive perhaps, to date, there is no conclusivel evidence for the latter. The very limited experimental data we have in this regard suggests the contrary. The primary goal being pursued in this project is to determine and compare the performance levels of observers recommending recall leading to the detection of breast cancers in the clinical environment with their performance in recommending recall and detecting breast cancers in the laboratory. This will be done by ascertaining and verifying performance levels of participants retrospectively from QA and clinical records and by performing a two-mode observer performance study, one simulating the ratings in the clinical environment and the other, an ROC-type study. In one mode we will be simulating the clinical environment (using BI-RADS ratings) and the other will include an ROC-type study (using confidence ratings). Readers will review and interpret both cases that they had previously diagnosed prospectively in the clinic, as well as cases diagnosed by others. The comparison we are attempting to do is at the very core of our ability (or not) to generalize laboratory observer performance data to the general clinical environment in a valid manner.