This proposal delineates strategies to improve current microarray data analysis tools by using control mRNA microarray datasets as benchmarks. These datasets, in which control mRNA molecules are spiked into the hybridization mix at known absolute or relative concentrations, can help to identify the most appropriate analysis protocol. First, spiked-in controls result in datasets with complete knowledge of all differentially-regulated genes; thus, the numerous analysis methods which have been proposed in the literature can be tested with respect to their ability to reveal these differentially-regulated genes. Second, investigating the sequence- and concentration-dependence of the RNA/DNA hybridization reaction could result in greatly improved methods for deducing gene expression levels from chip intensity levels, ideally accounting for cross-hybridization. Finally, other control dataset designs will be proposed to address the variability introduced during the in-vitro transcription/labeling steps, as well as during RNA amplification procedures. This discussion will focus on Affymetrix GeneChip microarray data, specifically, the Latin-square human and E. coli datasets created by Affymetrix, and a Drosophila dataset generated by collaborators Dr. Marc Halfon and Dr. Michael Boutros. [unreadable] [unreadable]