Our long-term objective is to develop accurate and generalizable prognostic models using complex data sources and contribute to improve patient outcome. This study is motivated by the clinical need of developing accurate prognostic models for esophageal cancer and the methodological need of improvement in data analysis with complex data source. Esophageal cancer presents a unique set of challenges that signifies the importance of this effort. First, its relatively low incidence rate makes large randomized experiments difficult to carry out. Large observational database becomes a particularly important source for research. Second, its rising incidence and high mortality rate present an urgent need to accurately identify prognostic factors where effective interventions could be directed. However, due to limitations from either data sources or statistical approaches, the prognostic roles of histology type of the disease, variation in treatments (i.e., the extensiveness of the surgery) and quality of care (using provider volume as a surrogate) and their clinical and health policy implications are hotly debated. Third, despite the availability of advanced statistical methods, such as mixed effects models, to handle a key aspect of data complexity from large databases - the heterogeneity due to clustering - few prognostic models are built upon these methods. This is partly due to the lack of guidance on how to best model the clustered data structure. Questions regarding whether, by how much, and under which setting such advanced data structure modeling would improve the accuracy of the prognostic model remain to be answered. The goal of this study is to improve our understanding regarding the key variables in esophageal cancer prognosis and to advance our knowledge regarding the validity and importance of the advanced statistical methods in prognostic modeling using complex data sources. It will be achieved through the following specific aims: (1) To develop comprehensive prognostic models for multiple outcomes (including fatal and no-fatal short-term complications, overall and disease-specific long-term survival) following esophageal cancer treatment using SEER-Medicare database. Bayesian mixed-effects models will be used to account for the clustered data structure. The models will be validated internally and externally. (2) To evaluate methods for clustered survival data using statistical simulations. By simulating data of complex structure that mimic real studies, the operational characteristics of several commonly used frailty models for issues of importance to prognostic modeling (including the individuallevel predictor effect estimation, cluster-level predictor effect estimation, heterogeneity assessment and outcome prediction) will be investigated. The performance of the frailty model approach and other techniques including the marginal and stratified approaches will be compared with regard to accuracy in predictors'effects estimation and outcome prediction. Results from this study will have direct applicability in aiding clinical and health policy decision making for the care of elder patients with esophageal cancer. They will also impact positively on future prognostic modeling using registry, administrative, and observational databases. PUBLIC HEALTH RELEVANCE: The proposed studies aim to develop comprehensive prognostic models for esophageal cancer using SEER Medicare linked database and to compare methods for clustered survival data using statistical simulations. The new prognostic models will help to more accurately predict outcomes of esophageal cancer patients with a particular set of disease characteristics and treated with a specific treatment by physicians at a particular institution of certain quality of care. Results from the simulation study will advance our knowledge on the operational characteristics of the related advanced statistical methods and provide guidance for future prognostic modeling efforts using registry, administrative, and observational databases.