Development of an organ as complex as the brain must depend on an intricate interplay of thousands of signaling proteins, orchestrated by an interacting web of regulatory factors. Recent ENCODE data reveal that the large tracts of so-called `junk' DNA in introns and between genes is, in fact, actively transcribed. The functions of these non-coding transcripts, which number in the millions, are virtually unknown, although one very small sib-class - the microRNAs - is receiving close attention as regulatory RNAs. One process that may be controlled by the non-coding transcripts is alternate exon use - a mechanism that adds significantly to the diversity of cellular proteins (particularly in the CNS), the regulation of which is little understood at this time. Recent technological advances that generate whole-transcriptome data have provided the means to systematically explore both these factors - the role non-coding transcripts, and the occurrence and control of splice variation. However, the bioinformatic challenges posed by these technologies are substantial, and the lack of comprehensive, well designed, and easily used software to manage, visualize, analyze and interpret these data will likely be the limiting factor in this field of research. We propose to develop a bioinformatics toolkit specifically to mine whole-transcriptome data from two fundamentally different technologies, the Affymetrix All Exon microarray which provides measures of over 1.4 million distinct transcripts, and the Solexa ultra high throughput sequencer, which provides for `digital' expression analysis of the whole transcriptome. The design and development of the software will be guided by prominent scientists engaged in the study of the brain, and will be applied to sample data set derived from neurological tissue, to ensure that the progra incorporates functions and annotations relevant to this field. PUBLIC HEALTH RELEVANCE: While the large-scale array technologies have provided an unprecedented capability to model cellular processes in the brain, both in normal functioning and disease states, this capability is utterly dependent on the availability of complex data management, computational, statistical and informatic software tools. The utility of the next generation of arrays - which focus on critical regulation and control functions of the cell - will be stymied by an initial lack of suitable bioinformatic tools. This proposal initiates an accelerated development of an integrated software package intended to empower biologists in the application and analysis of these powerful new technologies, with broadly reaching impact at all levels of biological and clinical research, and across every discipline. [unreadable] [unreadable] [unreadable]