The increasing use of genome-wide gene expression profiling has generated great valuable data that offer cost-effective secondary opportunities to investigate additional research questions that were not included in the original intended purpose. Our goal is to develop systematic approaches on a firm statistical footing to conduct a secondary analysis of the existing microarray expression databases. We will lay emphasis on the consistency between the biological background and the statistical modeling in the developments. Such consistency is critical for enhancing the biological efficiency of the developed analysis tools. The retina is a relatively simple and well-characterized area of the central nervous system. Currently, over 200 genes were identified that cause retinal diseases. We will apply the developed methods to retinal microarray expression databases to identify novel genes and gene-gene relationships (pathways) that govern the normal and pathological processes of the retina. We will also explore the possibility of making eye disease predictions through a public database search and comparison. We propose the following specific studies: 1) to develop novel analytical/statistical methods for detecting the genes involved in a biological pathway. We plan to design a statistical strategy that incorporates partial correlation as a core component in this application;2) to take the first step of turning microarray repositories into a disease diagnosis database. We plan to develop a Bayesian probabilistic method to infer the disease condition of a query microarray data set based on its similarity to those well-characterized data in database;3) to experimentally validate a subset of in silicon predictions. We will verify the expression of newly identified genes from the first study using standard methods such as real-time PCR and Western blot;4) to expand our existing software package Gene Expression Analyzer (GEA) (http://cell.rutgers.edu/gea/) to include the newly developed methods. The source code will be made public. The outcome of the project will significantly facilitate the reuse of the vast amount of public datasets to answer additional research questions, reduce the necessity to generate new data, and improve our understanding of cellular functions and networks under a variety of perturbations.