The expression of the genetic information inherent in our DNA includes 4 basic processes: 1) transcription into RNA, 2) splicing together the fragments of information in this RNA into messenger RNA, or mRNA;3) translation of the mRNA into proteins;4) modifying the proteins to make them effective. This proposal focuses on the second process, pre-mRNA splicing. Although the chemistry of the splicing reaction is fairly well understood, it is not yet clear as how the cell recognizes the demarcation of the few relatively short regions of the pre-mRNA that code for protein (the exons, ~100 nucleotides (nt) long, ~10 per transcript) within a long (~20,000 nt) pre-mRNA molecule. The splice sites themselves are comprised of sequences with specific features. For example, each spliced out region (the intron) almost always starts with a GT and ends with an AG sequence. However, the splice site sequences are not distinctive enough to provide an unambiguous mark. We will pursue 4 approaches with the aim of deciphering the "splicing code," i.e., the sequence elements and rules that allow recognition of splice sites lying within the sequence of the pre-mRNA or DNA: 1) We will add all possible sequences of 6 nt (4096) into a weakened exon to define the complete list of those that can enhance splicing. By repeating this experiments and comparing the sequences found after altering the exon in various ways, we will learn how different parts of the overall sequence interact to create a signal. These experiments exploit recently developed methods for massive sequencing of short regions of DNA. 2) We have found that limited intronic regions just outside the exon can play powerful roles in splice site recognition but little is known about the general nature or action of these sequences. We will investigate the effect of the position and protein-binding properties of these intronic enhancers on splicing and on chromatin structure. Our use of a cellular gene for this purpose is an improvement over less natural test systems currently in use. 3) It now appears that the density of signals influencing splicing is very high, so that any manipulation of a natural sequence is likely to change more than one signal at once. To minimize this effect we will build synthetic exons designed using insulated modules of known effect (enhancers and silencers of splicing). By placing these modules in various permutations, we will learn the rules governing their interactions. 4) Statistical analysis of the human genome sequence has allowed the successful prediction of exonic enhancers and silencers. We will extend such computational approaches to search for intronic and exonic signals that cooperate to enhance splicing and that may act to silence false splice sites. Many human genetic diseases are caused by splicing deficiencies and cancer cells often exhibit abnormal splicing patterns. A knowledge of the splicing code will enable this process to be targeted for therapeutic use, such as correcting a deficiency in a genetic disease, or disrupting a harmful splicing event in a tumor. PUBLIC HEALTH RELEVANCE: Human genes control our lives by having their information translated into the proteins that operate our cells. That genetic information is present as fragments that must be spliced together to make any sense, and disruption of the splicing process causes many genetic diseases and can contribute to cancer. Our proposal is aimed at understanding how this splicing takes place.