Studies of healthcare utilization from national and state administrative databases have been stymied by a lack of powerful methodological approaches to circumvent deficiency of specificity often common in these databases. New statistical and econometric techniques are needed to uncover hidden structures in healthcare utilization data. This research plan addresses the development and application of innovative regression models for analyzing jointly healthcare utilization and outcomes from clinical and observational studies. All models incorporate observed heterogeneity due to patient and system characteristics. In addition, we explicitly acknowledge unobserved heterogeneity due to omitted variables, endogeneity of explanatory variables, and censoring in outcome measures. Specifically, we develop and test several statistical models to analyze measures of utilization (eg, length of stay, cost) and health outcomes (eg, survival, quality of life). The complexity of the models depends upon the coarseness of the available data (eg, longitudinal, cross-sectional, hierarchical) and richness in observed covariates. Markov models are used with longitudinal data to account for the dynamics of movement of patients between health states (eg, relapse, remission) with covariate effects incorporated in transition intensities through multiplicative intensity and proportional hazards models. Heterogeneity due to unobserved or omitted variables is accommodated through random effects, frailties, and latent class models. We use Coxian phase-type models to elicit hidden Markov structures in cross-sectional data on healthcare utilization. Hierarchical models are applied to accommodate complex sampling designs and clustering (eg, patients within hospitals, hospitals within geographic units). All models will be rigorously tested in simulation and cross-validation studies. Application of our methods will be demonstrated in three studies with healthcare utilization and outcomes. (1) Using the Nationwide Inpatient Sample we jointly estimate total hospital charge and length of stay associated with procedures for two broad disease categories, heart disease and cancer. (2) Using a linked data set of Michigan Medicare, Medicaid and Cancer &Death Certificate Registries, we estimate the cost of treatment and survival in patients with colon, breast, lung and prostate cancer, controlling for observed covariates and unobserved heterogeneity. (3) Using a proprietary longitudinal data set of patient functioning while undergoing cancer treatments, we estimate costs of care, survival and physical function jointly, and assess the impact of changes in physical function on cost and survival. By expanding the repertoire of analytic tools for health services researchers, this project will provide methods for extracting valuable information on healthcare utilization and outcomes from administrative databases that can be used to inform cost-effectiveness analyses and health policy. By expanding the repertoire of analytic tools for health services researchers, this project will provide methods for extracting valuable information on healthcare utilization and outcomes from administrative databases that can be used to inform cost-effectiveness analyses and health policy