BACKGROUND: While colonoscopy is wldely used for colorectal cancer (CRC) screening, there are no data describing its yield in important demographic subgroups. Knowing the yield of colonoscopy for clinically important neoplasia (CIN),the factors associated with it, and having a tool to risk-stratify individual patients for CIN would increase the efficiency and effectiveness of screening colonoscopy and of CRC screening in general. SPECIFIC AIMS: 1) Measure and compare the yield of first-time colonoscopy for CIN within pre- specified demographic subgroups and among the indications for colonoscopy;2) Explore associations between demographic and clinical features and risk for CIN;3) Determine which features stratify risk for CIN, and derive a risk index for CIN;4) Establish the database and infrastructure for subsequent cohort studies on yield of subsequent colonoscopy METHODS: We will refine a state-of-the-art remote data extraction tool to retrieve de-identified data from the VA's eletronic medical record (EMR), pilot test it to ensure accuracy, and use it to retrieve selected data from an estimated 99,000 veterans aged 40 years and older from one of 18 geographically-diverse VAMCs who had a first VA-based colonoscopy between 2002 and 2008 for any indication except cancer or polyp surveillance. Programs and software for data extraction will be developed and pilot-tested with independent, "behind-the-firewall" review of a random sample of EMRs from the Indianapolis VAMC and with remote review of an EMR sample from other sites. After ensuring high accuracy of the extraction tool, it will extract relevant clinical information from each site (including colonoscopy and pathology reports), clinical features (e.g., colonoscopy indication) and candidate risk factors, which include age, sex, race/ethnicity, physical features (e.g., weight, height, blood pressure), family history of CRC, lifestyle factors (e.g., cigarette smoking, ethanol use), medications, comorbidity (e.g., diabetes, cholecystectomy, coronary disease). To categorize the colorectal findings, we will use natural language processing (NLP) software developed and tested at our Regenstrief Institute. The NLP software will determine location, size, and histology of colorectal lesions from free text colonoscopy and pathology reports, a process that will be validated with independent review of a random sample of reports from each site. We will describe and compare the prevalence of CIN within specific demographic subgroups and by colonoscopy indication. Further, we will construct a risk index that may be used to stratify individuals'risk for CIN and could be used to tailor screening colonoscopy. SIGNIFICANCE: This proprosal will provide new knowledge by quantifying the yield of colonoscopy for CIN by age, sex, and race. We will identify factors associated with CIN and may identify patient subgroups at different levels of risk for CIN. This research will improve veteran's healthcare by providing a scientific basis for tailoring CRC screening. Such tailoring will allow providers to target high-risk veterans for colonoscopic screening and to identify veterans at low- risk, for whom CRC screening may be performed with less invasive methods (e.g., with immunochemical fecal occult blood testing) or deferred until risk increases. Using risk of CIN to tailor CRC screening will make screening more efficient and cost-effective.