We established an expression database of mouse type A spermatogonia, pachytene spermatocytes, and round spermatids through the use of Serial Analysis of Gene Expression (SAGE). This exercise identified 16,091, 19,155, and 17,757 species of transcripts in spermatogonia, spermatocyte, and spermatids, respectively. 4,402 of these transcripts were novel. Computational analyses of the SAGE data led to the identification of stage-specific pathways and promoter modules and the construction of biological networks associated with different stages of spermatogenesis. In addition, these analyses led to the identification of a large number of genes with stage-specific alternative spliced variants. It has been suggested that alternative splicing is a prominent genetic process occurring during spermatogenesis. A number of genes have been known to undergo alternative splicing which confers novel activities to the variants. However, there is no systematic study on the stage-specificity of the splicing mechanism and the expression of the variants. We have initiated the characterization of novel stage-specific variants of a number of genes including heat shock protein 4 (Hspa4), H3 histone, family 3B (H3f3b) and ubiquitin protein ligase E3A (Ube3a). We will use Hspa4 as a model to investigate the role of alternative splicing in stage-specific regulation of gene function and its impact on the biological activity of the splice variants. Hspa4 has been shown to be induced in response to oxidative stress, which is critical for the survival and normal functioning of spermatozoa and male fertility. We confirmed the presence of three distinctive transcripts of Hspa4 in type A spermatogonia, pachytene spermatocytes and round spermatids. Further biochemical and functional studies are underway to characterize its regulatory mechanisms and the biological functions of the isoforms in germ cells.[unreadable] [unreadable] Analysis of the germ cell SAGE database also revealed the prominent presence of antisense transcripts. We are particularly intrigued by the presence of antisense transcripts derived from pseudogenes. Among the 19 genes with antisense transcripts that we identified, four (Uba52. Ch10, Calm2 and Ubb) had antisense transcripts derived from their pseudogenes on different chromosomes. Apparently these pseudogenes were derived from reverse transcripts of the respective parent genes and transposed to the intron of actively transcribed genes: Uba52 pseudogene resides in the intron of Cbx1; Calm2 pseudogene is present in the intron of Prkar2b; Ch10 pseudogene is contained in the intron of Sp3; and Ubb pseudogene is located in the intron of Catsper2. More interestingly, the orientation of the pseudogenes is anti-parallel to that of their host genes. Thus, the antisense transcripts of the pseudogenes will be produced as processed introns of their respective host genes. This raises the possibility that the two anti-parallel transcription units interact through hybridization of the sense-antisense transcripts. Subsequent experiments confirmed the presence of native double-stranded RNA of the anti-parallel genes, namely, Uba52-Cbx1, Ch10-Sp3, and Calm2-Prkar2b. We will examine the relationship between the anti-parallel gene pairs using Uba52 and Cbx1 as a model. The functional gene of Uba52 is on chromosome 8 while its pseudogene is on chromosome 11 embedded in the first intron of Cbx1. Uba52, Cbx1, and the sense and antisense transcript of the Uba52 pseudogene are expressed in mouse kidney cell line CRL-6436, which will be used as a model for the study. [unreadable] [unreadable] Analyses of the antisense transcripts suggested the existence of RNA-dependent RNA polymerase in the mouse germ cells. Antisense transcripts complementary to multiple coding exons were identified for Tcte3, Ldh3, and Calm2. Presence of the Calm2 antisense transcript in mouse testis and 3 mouse cell lines, namely,CRL-2576 (mouse spermatogonia cell line), CRL-1715 (mouse Sertoli cell line) and CRL-6436 (mouse kidney cell line) was confirmed. Confirmation of the antisense transcript being a product of the sense transcript was provided by a knockdown experiment. Knocking down the sense transcript of Calm2 using siRNA demonstrated reduced levels of both sense and antisense Calm2 transcript indicating that the synthesis of the Calm2 antisense transcript was dependent on the sense transcript. Calm2 antisense was not synthesized starting from the 3 end of the sense mRNA. The sequence representing the potential start site of the action of RNA-dependent RNA polymerase (RdRP) was defined. A hybrid RNA containing this sequence ligated to EGFP on its 5 end was generated and introduced into CRL-6436 cells. Orientation specific RT-PCR showed the production of an antisense RNA derived from the hybrid RNA. This result provided further proof of the existence of RdRP activity in mammalian cells. Experiments to isolate and purify the RdRP activity are underway.[unreadable] [unreadable] Because of the discovery of extensive antisense transcription and alternative splicing, we decided to obtain a more global view of the transcription dynamic of the mouse genome during spermatogenesis. To achieve this objective we examined the transcriptome of type A spermatogonia, pachytene spermatocytes, and round spermatids using the whole genome GeneChip Mouse Tiling 1.0R Arrays from Affymetrix. We found more than 45% of the transcripts are not annotated; current annotation only accounts for about 30% of the dataset and the rest are mostly ESTs. We confirmed the reliability and reproducibility of the results by using reference genes known to be differentially expressed at different germ cell stages, such as the protamin gene family members, meiosis expressed gene (meig1), and spermatogonia specific gene Lin28. We successfully integrated the power of SAGE and tiling array platforms to accelerate the identification of various gene regulation mechanisms during germ cell development. Antisense transcripts and pseudogenes obtained from our SAGE study and validated by RT-PCR were rapidly identified and confirmed in the tiling data set. The tiling data illustrated a more comprehensive picture of the configuration of transcript unit to allow better gene model prediction and validation. Initial analysis detected expression of 4421 novel gene models. Sequence analysis of these genes indicated that the structure of nearly half of them predicted using existing algorithm needs improvement. Coupled with information provided by SAGE and CAGE, tiling microarray analyses identified 482 new gene models, representing an 11% increase in the annotated protein-coding capacity. The high resolution detection of transcript unit by tiling array provides more reliable data in predicting splicing variants specific to developing germ cells. In order to gain better insights of other regulatory mechanisms, the tiling data will be analyzed with Encyclopedia Of DNA Elements (ENCODE) database. Up to this time, we have already found a surprisingly large number of non-coding RNAs (ncRNAs) that give rise to unannotated transcripts or novel isoforms of protein-coding genes which appear not to encode polypeptides. We will continue to employ the tiling platform to study the potential of ncRNAs such as miRNAs as regulatory molecules during germ cell development.