Substantial evidence gathered over the last 50 years shows that adherence poses a crucial barrier to effective treatment and survival for cancer and other chronic diseases. At least one in five cancer patients do not adhere to treatment regimen, with much higher disease-specific rates. This non-adherence, or deviation from the recommended and expected clinical path, can dramatically increase costs of care, hospitalizations, adverse outcomes and the chance of preventable death. What causes non-adherence to treatment regimens is currently not rigorously understood. Current adherence research methods largely rely on survey instruments that have limited scale and scope, provide lagging information that inhibits timely intervention, and offer little actionable information to help patients to adhere to their care regimens. Further, the nature and timing of intervention to improve adherence have not been researched in depth. With continuous changes in cancer treatment, newer proactive approaches and methods for surveillance of patient adherence and targeted interventions are needed. In this project, we examine the feasibility and validity of a novel approach that uses a computational model to glean fine-grained attributes of cancer patients from standard electronic medical records. Our preliminary work has shown that electronic records to contain free-form text describing patient sentiment, vitals, medical condition, side effects, social history and family status written by physicians, nurses, medical assistants, and other staff during every visit encounter. With the steady adoption of electronic medical records by clinicians across the US (currently 29% and rising at 12% per year), clinical notes found in electronic records offer a tantalizing source of insight into patient adherence and behavior. Current adherence research has not tapped this rich source of data, even though many disciplines including biomedical informatics have employed natural language processing and text-mining techniques to glean patterns in semi- structured biomedical data. We aim to employ similar but novel, scalable computational models to glean a rich set of risk factors for patient non-adherence from 1 million patient encounter records, corresponding to 24,050 patients that span a 10 year time-horizon. Our objectives are to estimate the risk of a patient's ability to adhere to a prescribed regimen and enable targeted and timely interventions by using computational analysis of unstructured and structured fields in standard clinical documentation. PUBLIC HEALTH RELEVANCE: We aim to show the feasibility of an early warning system that detects and estimates a cancer patient's risk of non-adherence to treatment regimens by analyzing unstructured text in standard medical records. This technology has tremendous relevance for improved quality of care, proactive management of chronic diseases and patient safety.