Gene expression measurement using cDNA and oligo arrays continues to be a popular and useful technology for genomic analysis. High throughput methods for measuring protein concentrations are also increasing in popularity. One of the more challenging problems results from the large volume of data generated in these experiments. Image capture, processing, interpretation and quantification remain important fundamental issues. Quality control and experimental design must be carefully addressed. Many problematic statistical, image processing and bioinformatics issues remain and are addressed in this project. We have focused our research attention on developing new methods for analysis of gene splicing, based on microarray platforms especially designed for the purpose. We have developed and enhanced an analysis strategy using statistical ANOVA for efficient detection of potential splice events, and applied this technique to major publicly available data sets. We have also recently applied these approaches to studies of a mouse knock-out model, to a model of the inflammatory response of immune cells, and to the response of cells to an anti-cancer agent. Two measurement platforms, the Affymetrix exon array and the ExonHit junction probe array are being studied. The entire Framiningham Heart Survey SABRe project has begun to use this new technology, which increases the available transcriptional information by roughly a factor of 10, compared to standard expression arrays. For almost a decade, our group has functioned as the "statistical analysis core" for a high-volume microarray laboratory in CCMD/CC. All microarray studies by this group now pass through our analysis pipeline. We now also perform as the analysis core for the microarray core facility for the NHLBI, more than tripling the throughput of microarray studies into our database and pipeline. This "core" facility has generated more than a dozen new collaborative projects per year, in which our staff are primarily responsible for statistical analysis and interpretation of microarray data. This year, our group has completed the analysis of the first component of the multi-year, Framingham Heart Survey SABRe project, in which ultimately about 5,000 biological samples will be analyzed for microarray expression profiles. In combination with clinical and other laboratory data, this dataset will no doubt lead to major advances in the understanding of expression signatures and heart disease. The first, feasibility study analyzed samples from 50 individuals, with four blood derived sample types per individual;PBMC, lymphoblastoid cell lines, PaxGene tubes and buffy coat. The technical goal is to chose the best, or at least usable sample types for analysis in the larger study. The result shows that PBMC and PaxGene tubes are roughly equally good in the quality of results. Affordable, high-quality software availability has been one of the bottlenecks in analysis of microarray data. We have continued development of the "MSCL Analyst's Toolbox" to address this need. Built upon the commercial statistical package JMP, this toolbox allows investigators to download Affymetrix microarray data from a central database, normalize and transform the data, inspect it for a variety of outliers or defects, perform a variety of statistical tests to select relevant genes affected in the experiment, and then visualize and classify various patterns of gene expression. Because our Toolbox is written in open source scripts, its statistical tests can be modified as needed to conform to novel or unique experimental designs. In collaboration with over forty investigators in CC, NHLBI, NIDCR and other ICs, this tool has been applied to several dozen microarray studies. One-day and two-day Toolbox training workshops are regularly presented on the NIH campus. In a major NIH-wide project, we maintain a database for storage, retrieval and analysis of Affymetrix microarrays, NIHLIMS. As part of this collaboration, we have created a data analysis pipeline and bioinformatics toolset, including both commercial and freely available software. The database currently stores information from over 4000 microarrays. Our downloadable tool set (MSCL Analyst's Toolbox) is now mature, widely tested and applied in numerous studies. Working with investigators in NCI, CC, NHLBI, NINDS, NIAID, NHGRI, NICHD, NIA, NIDDK, NIDA we have developed, customized and applied this software for the analysis of microarray based studies. We also maintain a quarterly-updated set of annotation files for use with Affymetrix data, in a format for convenient download and use by our collaborators. In another study with investigators in NEI, we are evaluating the utility of several biological models for age-related macular degeneration and for retinal pigment epithelium (RPE) tissue development, using microarray technology. Preliminary results show that RPE tissues from several sources can be clearly distinguished from non-RPE, by the increased expression of a number of RPE-specific genes. We are now investigating the properties of RNAseq, a method for more accurately assessing the transcriptome using next-generation sequencing technology. In one project, with investigators in NHGRI, we are assessing the reproducibility, both within subject, and within lane, of the methodology. In another, we have analyzed the transcriptome of rat pineal gland, both day and nightime. We have found a dramatic number of new unexpected differences as well as dozens of expression differences already known from microarray analysis. Indeed, about 50% of the "reads" generated in this study do not belong to well-document rat genes, and are presumably a result of novel transcription from portions of the genome not yet annotated.