Our group aims to use evidence of genetic association with measures of brain development and function as a basis for further characterization of molecular pathways. One study which demonstrates this is looking at the genetic protection in healthy siblings of subjects with schizophrenia. Through the CBDB, we have access to a large sample of healthy siblings, which allows us not only to differentiate state from potential trait clinical phenomena;it also permits an investigation of the genetics of protection within a family. This is another example of the unique potential of our family samples. Because siblings within a family each have the same likelihood of inheriting risk associated and risk protective alleles from their parents, it is reasonable to predict that healthy siblings will show distorted transmission of protective alleles similarly to ill offspring and distorted transmission of risk alleles. We have tested this in a study of the gene PRODH (proline dehydrogenase), in which three functional alleles impacting on POX (proline oxidase) enzyme activity, therefore having a clear biological effect, showed distorted transmission to affected offspring in our families. We focused on the haplotype containing the three alleles that have been confirmed to affect enzyme activity, and using FBAT (family-based association test) found significant over transmission of the haplotype containing alleles related to increased POX activity to affected offspring. In dramatic contrast, significant over transmission of the haplotype related to decreased POX activity was found in the unaffected siblings. We will be adapting this novel strategy to identifying protection associated alleles and genes (which may not be the same as risk genes) related to unaffected status in our families in future studies including our GWAS (genome-wide association studies) analyses. This approach has the potential to define new targets for treatment. To date, we have tested numerous single nucleotide polymorphisms (SNPs) in well over 120 genes, including some of the less established but intriguing candidates such as PRODH, RGS4 (regulator of G-protein signaling 4), CHRNA7 (nicotinic, alpha polypeptide 7), PIP5K2A (phosphatidylinositol-5-phosphate 4-kinase, type II, alpha), and PPP3CC (protein phosphatase 3, catalytic subunit, gamma). Among our many accomplishments, we have fully sequenced the 10 exons and flanking sequences of 180 proband chromosomes for dysbindin, sequenced two exons of MRDS1 (orofacial clefting chromosomal breakpoint region 1), and sequenced 1.5 kb of the GAD1 (Glutamic acid decarboxylase 1) upstream region. A total of 21 new SNPs (single nucleotide polymorphisms) were discovered in these genes, 15 of which were genotyped in the clinical samples. We have re-sequenced the exons and splice sites of GRM3 (glutamate receptor, metabotropic 3) in 180 chromosomes, which led to the discovery of a few rare SNPs. We have likewise re-sequenced risk regions of KCNH2 (potassium voltage-gated channel), ErbB4 (v-erb-a erythroblastic leukemia viral oncogene homolog 4), PI3K (phosphotidylinositol 3 kinase), FGF20 (fibroblast growth factor 20), DARPP (dopamine- and cAMP-regulated phosphoprotein), and COMT (catechol-o-methyltransferase) and identified novel variants in these genes as well. We routinely submit our Taqman genotype assay to reproducibility checks by re-genotyping (avg. accuracy >99%) and spot accuracy checks done by double stranded sequencing (avg. >99% for most SNP assays). Genotypes are called manually and confirmed. We perform Mendelian checks and higher order (e.g. multiple recombinants) error checking with the program MERLIN. Microsatellite genotyping has been performed in collaboration with the NIMH Mood and Anxiety Program. We measure linkage disequilibrium (LD) between markers with the D prime and r2 statistics from cases and controls in parallel using the GOLD software package. All SNPs are tested for departures from Hardy-Weinberg equilibrium. For large numbers of loci, we use the program SNPHAP to reconstruct haplotypes and estimate their frequencies in unrelated individuals. For family-based association studies of the discrete clinical phenotype, we use the programs FBAT, TDTPHASE and TRANSMIT for unknown phase haplotype estimation. Case-control analysis of individual SNPs and SNP haplotypes is done using logistic regression in STATA and COCAPHASE programs. All P values are computed empirically with 10,000 permutations or bootstraps as the programs provide. Tests of association to quantitative traits such as the intermediate phenotypes are performed by the FBAT and QTDT (quantitative transmission disequilibrium test), which allows variance-components testing of family-based samples for association and transmission disequilibrium. The orthogonal model used is robust to population stratification because, analogous to the conventional TDT, it only considers transmissions from heterozygous parents. To control for possible artifacts due to allele frequency differences across ethnic groups, analysis limited to Caucasians is performed in parallel. We have also established a panel unlinked of SNPs to use as a potential genomic control panel for case control association studies, including intermediate phenotype analyses, to address potential population admixture artifacts. In our genomics project we acquire extensive genetic variation data in our susceptibility genes and complete the catalog of genetic risk genes in our datasets. As part of the GCAP program, we have greatly increased the genotyping and reduced costs by purchasing high throughput equipment for in-house testing. We project that about every 4 months for the next 2 years we will genotype a minimum of 768 SNPs, perform follow-up work on established genes and test novel genes. In addition, we outsource the majority of re-sequencing for SNP detection to DNA sequencing companies. All exons, splice sites, and 10 kb of the upstream region will be re-sequenced in an initial pass, then some regions of some genes are sequenced further (e.g. the introns or positive haplotypes) and/or more individuals. Because most functional SNPs and mutations are not in protein coding regions, it is critical to fully characterize transcripts species in several regions of post mortem human brain. To accomplish this, we routinely execute basic mRNA transcript characterization technologies such as 5'and 3'RACE and screening of full-length transcripts, normalized cDNA libraries from multiple brain regions. This work also serves to guide quantitative RT-PCR (real time-polymerase chain reactions) and in situ hybridization expression studies. Another project of central importance is the statistical analyses of gene-gene interactions. It is likely that certain gene and allele combinations interact epistatically to produce risk greater than that predicted by the individual odds ratios. It is also likely that some gene combinations will increase risk even in the absence of main effects in each gene. We are using the data driven analytic approach developed at Vanderbilt called multifactor dimensionality reduction (MDR) in an attempt to detect sets of interacting alleles that predict disease status. We also engage in collaborative discussions with Salford Systems, originator of the programs CART, MARS, and TREENET, to explore and execute other data mining strategies. Our statistical geneticist uses the wealth of data to model and test complex gene-gene and gene-environmental interactions, and establish some objective criteria for integrating statistical genetic (disease and intermediate phenotype) data with convergent biological data both to gauge overall significance of given genotype/haplotype, phenotype correlations and to evaluate attributable risk.