Due to a variety of experimental and computational issues, gene expression microarray data sets are far from optimally exploited. Limited sensitivity, wide spread and spurious cross hybridization and varying labeling efficiency are some of the well-known problems preventing the extraction of robust platform independent results from gene expression data. A careful combination of statistical and biochemical methods will be needed to produce reliable gene expression estimates. However, the current ingenious computational algorithms to analyze microarray data cannot be appropriately evaluated due to the lack of unbiased, realistic data sets, complete with raw data files and a large number of independent confirmations. We will establish exactly such a data base depositing our own data and actively collecting submissions from collaborators. We will demonstrate the utility of the proposed data base by determining the relative merit of several widely used microarray. normalization algorithms. We will also develop probe sequence based methods to reduce cross-hybridization noise in microarray measurements in order to enable the robust analysis of disease associated global gene expression profiles across different technologies and different data sources. The subset of reliable probes will be empirically verified then they will be distributed through Bioconductor. Microarray analysis is a powerful tool for improving disease classification, identifying novel therapeutic targets etc. The proposed data base along with the identification of reliable microarray probes will increase the reliability and potential utility of this promising technology. [unreadable] [unreadable] [unreadable] [unreadable]