Two challenges in analyzing health effects of multi-pollutant, long-term air pollution exposure are: (i) interpreting parameters in health effect regressions (requiring dimension reduction by principal component analysis, clustering, etc.) and (ii) spatial misalignment of exposure data. Spatial misalignment refers to the situation where exposure data are not available at locations where subjects live, so exposures need to be estimated using a spatial prediction model based on monitoring data from different locations. This first stage exposure modeling typically combines regression on geographic covariates with spatial smoothing. Predictions from the first-stage model are then used in a health effect regression model. Extending this paradigm to multi- pollutant studies requires generalizing spatial prediction methods to multivariate exposure vectors, ideally using a method that is synergistic with (or at least compatible with) the dimension reduction for health effect analyses. We propose two novel methods: predictive sparse principal component analysis and predictive k- means clustering. Our methods seeks to find sparse principal component loadings and k-means cluster centers that explain a large proportion of the variability in the data while ensuring the corresponding low- dimensional representations are predictable at subject locations. Predictions of these lower dimensional quantities can be used in health effect regressions. Our approach is preferable to a sequential approach (dimension reduction followed by spatial prediction), which may result in representations that are difficult to predict at subject locations We illustrate the practical utility of our methods by applying them to national monitoring data from EPA regulatory networks and epidemiologic analyses of the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air).