Gene expression measurement technology has evolved rapidly over the last two years, with the use of large arrays of cDNA taken from EST libraries receiving considerable attention. Fluorescent-tagged probe DNA hybridized to microarrays printed on glass is one promising technology. Another approach which hybridizes P33-labeled probe DNA to arrays on nylon membranes has the advantage of readily available reagents and instrumentation, greater sensitivity and ability to use smaller samples. Images of arrays of either sort must be quantified to produce a list of numerical intensities proportional to the expression levels corresponding to the gene fragments placed at each spot in the array. Using bioinformatics tools, spots on these arrays must be associated with sequence information for the corresponding clones, Unigene clusters, genes and protein products. Links to structural and functional information are also required.Numerous statistical, image processing and bioinformatics problems confront users of these technologies. As arrays can be constructed to contain thousands of spots, manual analysis of the resulting images is not feasible. Further, as investigators seek to couple this technology with laser capture micro dissection (LCM) in the analysis of pathological tissue, the technology itself must be refined and improved. Accordingly, this projects seeks to address problems in this area at the statistical, numerical, computational, and informatics levels. Progress in FY98: Working with several laboratories in NCI, NICHD, NIHGR, and NIDR, we have analyzed over 100 array images to estimate intensity levels representing over 500,000 DNA hybridization measurements. To accomplish this, a new program PSCAN was developed, which facilitates the image-processing steps of the analysis and produces optimal estimates of spot intensities. The program is written in MATLAB, and the code is being made publicly available, and a Web-site distribution site is being established. Initially, the image analysis problem requires parsing the image, recognizing and identifying the "lattice" of spots, determining local or global background intensity and estimating the "spread" of intensity due to the non-locality of radioactive nuclide detection. Appropriate corrections for these factors are attempted.Our analysis method relies on a number of data visualization tools, and allows users to identify significantly over- or under-expressed genes in a comparative study. Importantly, these techniques also allow users to identify experimental artifacts, outliers and other data anomalies which are present and a large percentage of hybrization studies, such as non-constant background hybridization, image defects, dropouts, printing artifacts, spot bleeds, etc. A set of rules has evolved which allow for calibration of two images, comparison of more than two images, determination of the correlation of spot intensities between experiments, and use of principle components and clustering to identify potential patterns of expression within a moderate to large set of experiments.Our statistical analysis has quantified the contributions to measurement error due to a variety of factors including spot-to-spot, filter-to-filter, hybridization, mRNA preparation, cDNA preparation, quantification, image acquisition, and sample-to-sample variability. Our findings emphasize the need for careful control of each of these sources before this technique becomes deployed in large-scale clinical studies. Our findings will also be valuable to developers of arrays in that they suggest improved spotting configurations and array designs. Moreover, as experience with each array format over several different labs increases, the bioinformatic value of each spot or clone increases. Those spots which are highly sensitive to experimental conditions can be identified while other spots whose intensity is essentially unchanged, possibly identified as "control" spots, may become useful for calibration purposes. By applying our software tools to many experiments, we have gained practical experience with this assay technique, and now collaborate in the refinement of this technology and the design of future experiments.