Medical decision support tools are increasingly available on the Internet and are being used by lay persons as well as health care professionals. The goal of some of these tools is to provide an "individualized" prediction of future health care related events such as prognosis in breast cancer given specific information about the individual. These tools are usually based on models synthesized from data with a fine granularity of information. Under the umbrella of "personalized" medicine, these individualized prognostic assessments are sought as a means to replace general prognostic information given to patients with specific probability estimates that pertain to a small stratum to which the patient belongs, and ultimately specifically to each patient (i.e., a stratum with n=1). Subsequently, these estimates are used to inform decision making and are therefore of critical importance for public health. Responsible utilization of prognostic models for patient counseling and medical decision making requires thorough model validation. Verification that the estimated or predicted event probabilities reflect the underlying true probability for a particular individual (i.e., verifying the calibration of the prognostic model) is a critical but often overlooked step in evaluation, which usually favors the verification of the discriminatory ability of the model. Selection of the best predictive model for a given problem should be based on robust comparison that takes into account errors in individual predictions, calibration, and discrimination indices. A robust test for comparison of calibration across different models does not currently exist. Our specific aims are to: (1) Characterize the main deficiencies of existing calibration indices in the context of individualized predictions and develop a new model-independent calibration index and comparison test that can be used to assess and compare predictive models based on both statistical regression and machine earning methods;(2) Unify the theories on decomposition of error into discrimination and calibration components stemming from the statistical and machine learning communities to derive a refined measure of alteration that can be calculated from measures of error and discrimination. We will compare the performance of the new methods with existing ones in different predictive models derived from real clinical data related to different medical domains.