The goal of this proposal is to comprehensively identify all sequence-based functional elements associated with transcribed sequences including both protein coding and non-protein coding sequences, characterizing gene structures including transcription start sites (TSS) polyadenylation sites and alternative transcripts detected in a representative and diverse panel of human cells and tissues. Based on the empirically determined characteristics of the detected transcripts uncovered in this proposal, a classification system for transcribed protein coding and non-protein coding portions of the human transcriptome will be established. Our aims include first to generate a comprehensive set of subcellular compartment-specific long (>200 nucleotides, nts) and short (<200 nts) polyadenylated (polyA+) and non-polyadenylated (polyA-) RNA samples from each of the cell types studied. These RNA samples will be analyzed using: a) high density tiling arrays (5 nucleotides [nt] interrogation resolution for long and short RNAs), b) sequencing (pyrosequencing [454] and clonal single molecule sequencing for short RNAs [Solexa]), c) sequenced paired-end ditags (PETs) for 5'TSS and 3'termination locations for polyA+ transcripts and d) sequenced cap analysis of gene expression (CAGE) tags for 5'TSS of polyA- RNAs. Characterization of full length subcellular compartment-specific transcripts will also be carried out using: 1) a combination of rapid amplification of cDNA ends (RACE), RT-PCR and sequencing, 2) RNA immunoprecipitation (RIP) and 3) in situ immunohistochemistry. These characterization steps will provide additional information concerning the annotated and unannotated RNAs found to be associated with known functional, compartment-specific proteins and their localization in subcellular organelles of known function. The research and health-care community are well positioned to take advantage of a detailed catalog of classified transcribed regions in the human genome. For example, the identification of millions of single nucleotide polymorphism (SNPs) and the ability to genetically alter specific transcript expression by small inhibitory (si-) and micro (mi-) RNAs are highly useful for the molecular characterization of diseases associated with the transcribed regions. However, the utility of these and other genomic resources are dependent upon having a complete and high quality catalogue of transcribed regions.