Identification of functional long non-coding RNAs in male germ cell development Personnel: Lee, Cheung, Xiao, Boucheron, Chan, Rennert Mammalian cells produce thousands of non-coding RNAs (ncRNAs) of unknown function. These non-protein-coding portions of the genome often were considered junk, but present research has highlighted that ncRNAs can have a wide range of regulatory functions. Long ncRNAs (>200bp) have been shown to be involved in mouse ESCs pluripotency and differentiation. Whole-genome tiling arrays and Serial Analysis of Gene Expression (SAGE), in our lab, have demonstrated widespread transcription of long ncRNAs during male germ cell development. This project attempts to identify male germ cell specific long ncRNA candidates that enhance understanding of the regulatory functions of long ncRNAs during male gamete differentiation and development. Analysis of our established male germ cell SAGE data we isolated 494, 152, and 201 stage-specific tag sequences with at least five tag counts from spermatogonial stem cells (spermatogonia), pachytene spermatocytes and round spermatids respectively. A computational algorithm was developed to blast, map and compare the RNA secondary structure of these candidates to assess long ncRNA expression. The SAGE sequences were also compared to various ncRNA databases, such as NRED, RNAdb, fRNAdb, and NONCODE v2.0 to determine the location of the tags in the genome. The selection of long ncRNAs was based on the SAGE tag count and the distance to the poly-A tail. Since SAGE normally cuts the RNA strand within a short distance from the poly-A tail, sequences matched to ncRNAs that were farther from the poly-A tail could be a result of repetitive sequences in the genome. A higher SAGE tag count means a sequence is more highly expressed, so it is more likely to play a regulatory role. Therefore, higher priority was given to sequences with a higher count that were located closer to the poly-A tail. In total, we identified 50, 35 and 24 potential long ncRNA candidates in spermatogonia, pachytene spermatocytes and round spermatids respectively. We classified them based on various genomic features including promoter, intronic, intergenic and anti-sense. The top ten candidates were selected from each stage for subsequent analysis. The expression and size was validated in male germ cell samples by Northern blot analysis. Some long ncRNAs were a function of the developmental age of the testis. Using RNA from 2-week, 1-month, 2-month, 3-month, 6-month, and 12-month mouse testis, six spermatogonia-specific candidates, six spermatocyte-specific candidates, and seven spermatid-specific candidates displayed a unique expression pattern. Some candidates also exhibited tissue-specific expression patterns;one spermatid-specific candidate was testis specific, one spermatid-specific candidate was testis and brain specific, and one spermatid-specific candidate was testis, brain, and ovary specific. Preliminary functional analysis in a P19 differentiation cell model suggested some long ncRNAs decreased remarkably following induction of differentiation by retinoic acid. The reduction was more obvious in the comparison of testis from Vitamin A Deficiency (VAD) and control animals. Some ncRNAs exhibited more than a thousand fold decrease contrasted to control testis. These results suggest long ncRNA may play an active role in male germ cell differentiation and development via retinoic acid-related regulatory pathways. Additional functional studies are underway to attempt to identify these roles of long ncRNA. We are developing an in vitro-differentiated ESC-to-male gametes model from mouse R1 embryonic stem cells to study these phenomna at key male germ cell stages. Male germ cell Informatics Personnel: Lee, Cheung, Chan.;in collaboration with Dym, Claus, Sastry We developed the first sequence-based germ cell transcriptome database, GermSAGE, for male germ cell transciptome analysis. GermSAGE is a comprehensive web-based database, generated from Serial Analysis of Gene Expression (SAGE), representing major stages in mouse male germ cell development;it utilized sequence tag coverage of 150k in each SAGE library. A total of 452,095 tags derived from type A spermatogonia, pachytene spermatocytes and round spermatid are included. It is a web-based tools with customizable searching parameters for browsing, comparing and searching male germ cell transcriptome data at different developmental stages. The user can overlay male germ cell transcriptome data with a variety of annotated tracks below the genome view window, and create a custom map by adding tracks to view various types of data and specific genomic landmarks. The browser offers a broad list of options. It includes 1) mapping and sequencing tracks that contains information about the position, marker and GC percentage of the annotated gene region. 2) Genes and Gene prediction tracks on gene annotation and predictions from various sources. 3) mRNA and EST tracks that contains information on transcripts, CAGE (Cap Analysis of Gene Expression) tags to identify potential transcription start sites and alternatively spliced RNA species. 4) Expression and its regulation that includes expression data from different microarray platforms and regulatory information such as CpG islands and microRNA. 5) Comparative genomics identifies the sequence conservation among species and 6) Variation and repeats. This information provides insights on gene regulation, and facilitates the generation of hypotheses. The data can be exported and visualized in a tabulated format, which permits flexible processing and analysis of downstream pipelines as well as interaction analysis. It will be useful for revealing regulatory networks, allow novel gene discovery, and can provide insight about molecular and cellular processes. As an example cross-platform data comparison of the molecular signature of spermatogonial stem/progenitor cells in 6-day-old mouse testis yielded novel candidates involved in stem cell maintenance. The long-term scientific vision of GermSAGE is to provide a central platform to scan the dynamic genomic changes in male germ cell development, identify/predict gene expression patterns. Work is in progress to allow such dynamic data analysis. GermSAGE is freely available at http://germsage.nichd.nih.gov/. Developmental staging of male murine embryonic gonad by SAGE analysis Personnel: Lee, Chan, Rennert in collaboration with Lau To examine if particular chromosomal regions play more important role in male goand development, the transcriptiome activity on chromosomal level at different time point in male gonad development was analzyed by locating the transcription hotspots using positional gene enrichment analysis revealed in terms of chromosomal bands. We observed there was a progressive increase on gene expression activities from E10.5 to E17.5 (data not shown), with the highest at E12.5 to E13.5 when Sertoli and Leydig cell undergo active genetic programming. To further delineate the functional roles of the transcription hotspots, positional gene enrichment analysis on overrepresented chromosomal regions were calculated. We compared the genes associated with the overrepresented chromosomal region to the available mouse model list from the Jackson laboratory (http://jaxmice.jax.org) to see if the mouse would carry defects on gonad development. The results showed a number of transgenic mice carrying developmental defects on the gonads or gonadal tumors with defective hotspot genes. Taken together, this approach provided another perspective by focusing on key chromosomal regions in male gonad development.