This project aims to provide biologists with new tools to help them understand complex systems for which they have different sources of heterogeneous 'in situ'data. These data present many levels of heterogeneity and come concurrently with spatio-temporal and prior information that need to be incorporated into integrated data structures. This collaboration starts with the design of the collection process and provides tools for data integration and analysis written around the statistics package R and an interactive image analysis program GEMEDENT written in JAVA. The project concentrates on two specific types of heterogeneous data: metagenomic data and sequence mixtures provided by the new pyrosequencing machines and cell image data provided by automated microscopes. The first type of heterogeneous data are microbial soil sample data collected by Alfred Spormann from Civil and Environmental Engineering at Stanford. The proposal focuses on applying Bayesian computations in the design of sample locations and number of sequences collected and then using spectral multivariate methods to analyze diversity indices as tables (instead of summaries), thus incorporating the data structure into the decompositions. These methods will also be useful in the study of mixture data from pyrosequencing HIV, bacteria, viruses and cancer cells. The second study focuses on the interaction between immune cells and breast cancer in a collaboration with Peter Lee, hematologist at Stanford. We will analyze data from microscope images of stained lymph nodes. An integrated image analysis system enables the automatic detection of the location and size of many different cell types from stained images. Random forests have been incorporated into the image analysis system and an effective interactive boosting component provides the user with the possibility to iterate the learning process until a desired level of accuracy is attained. These data enable us to infer the spatial and dynamic interaction between the tumors and the immune cells. A postdoctoral fellow will be in charge of combining the cell data with the clinical history and the micro-array expression data from the same patient. The heterogeneity will be dealt with by using exploratory multivariate techniques based on spectral analysis, kernel methods and graphical representations.