This application is a "grand opportunity" ARRA (RC2), and describes a high-impact agenda that can be completed in 2 years. This work will create or preserve 21 American jobs. We propose an ambitious program of work that we are highly confident that we can achieve based on the experience and quality of the investigators, prior evidence in bringing complex projects to fruition, substantial prior scientific interactions, and reliance on proven technologies in high-throughput facilities. Our goals are to develop a comprehensive understanding of the genomics of transcription in normalcy and then to discover DNA and RNA biomarkers for major depressive disorder (MDD). This work is essential to developing a more complete understanding of the biological basis of MDD, a common complex trait associated with considerable morbidity, mortality, and personal/societal cost. All biological samples have been collected from well-defined populations, and are now available. First, we will conduct a "genetical genomics" or eQTL study of 800 MZ and 825 DZ twin pairs. Each subject will be assayed for genome-wide SNPs and CNVs (Illumina 660K) and gene expression (Affymetrix U133) from peripheral blood sampled under standardized conditions. These data will be augmented with genome-wide methylation, miRNA, exon splicing, and nucleosome occupancy arrays (200 MZ pairs). "RNA-seq" will be conducted on 10 MZ pairs for qualitative analyses. Analyses will determine the genetic architecture (genetic and non-genetic proportions of variance via twin analyses) for every transcript, and then to determine the genome-wide associations (i.e., SNP-transcript eQTL pairs). These analyses will be expanded to consider transcriptional modules. The key deliverable is a detailed catalog of the general and specific architecture of transcription plus raw intensity files. Second, we seek to discover DNA and RNA biomarkers relevant to MDD, capitalizing on Aim 1 results and a large MDD study with repeated clinical and biological assessments. We have previously shown that PB is a reasonable proxy for CNS expression. We will employ an advance modeling framework. (a) Using baseline data, we will identify biomarkers for MDD by comparing 1000 controls with 1400 MDD cases via comparisons of SNP, CNV, expression transcripts, and transcriptional modules. (b) Using longitudinal data, we will contrast gene expression signatures assessed at baseline and two years later in 200 controls and 500 MDD cases. PUBLIC HEALTH RELEVANCE: The work proposed here will create/preserve 21 American jobs. The data and results will be made widely available to the research community. Multiple commentators have called for a large, comprehensive, and careful study such as that proposed here. MDD is a first-rank public health problem, and this project is essential to developing a more complete understanding of the biological basis of MDD and could advance our knowledge significantly in a relatively short time.