The Surveillance, Epidemiology, and End Results (SEER) Program is one of the premier cancer surveillance programs in the world being currently composed of population-based cancer registries covering 30% of the total US population reporting on over 450,000 cancer cases annually. The information collected on each and every cancer patient in SEER include demographics, a description of their cancer (e.g. cancer morphology, grade, and extent of disease such tumor size, number of nodes, etc), limited initial treatment information, and patient follow-up including cause of death for deceased patients. SEER registries collect information on first-course treatment, including surgery, chemotherapy, radiation therapy, and hormone therapy. However, the SEER program does not release data on chemotherapy/hormone therapy because SEER believes information on chemotherapy/hormone therapy to be incomplete. Activities are ongoing to investigate the opportunity to supplement SEER chemotherapy data with other data sources. For example, recently SEER chemotherapy data was augmented for older cancer patients (age 65+) with Medicare claims (Noone AM, Medicare Care 2014) from inpatient hospitalizations, out-patient facilities, physician claims but not including Medicare Part D. Augmenting SEER data with Medicare Part D is important since this data source contains valuable information on prescription drug use for elderly cancer patients. Since a growing number of cancer patients are taking oral chemotherapy (i.e., prescription drugs), supplementing SEER chemotherapy variable with Medicare Part D data will provide the most accurate and complete treatment information for SEER cancer patients. However, approximately 50% of Medicare beneficiaries have Part D coverage, therefore, there is a need to develop imputation strategies based on available patient, area-level socio-demographic, and medicare coverage (e.g., most patients will have Part A and part B coverage) information to inform usage about prescription drug to the Medicare population without Part D coverage. Additionally, data collection on several clinical and molecular factors such as human epidermal growth factor receptor 2/neu (HER2) for breast cancer began in 2010. It will be even more important to be able to develop and distribute imputed data sets, as the initial years of data collection on important biomarkers such as HER2 status will probably include many missing observations. With the aid of these imputed data sets, we can provide researchers with tools to better understand the molecular and genetic alterations in breast cancer incidence and report trends in the most accurate manner.