Our goal is to identify susceptibility genes for dyslexia, defined as unexpectedly low accuracy and/or rate of reading or spelling of neurobiological origin. This complex disorder affects 5-12% of school-aged children and, despite costly and intense remediation, aspects persist into adulthood with long-term educational, economic, and social repercussions. There is consensus from twin and family studies that genetic factors play a role in dyslexia and strong evidence from linkage analyses that there are discoverable risk alleles/genes. The genetic paradigm provides a powerful approach for discovery and delineation of underlying biochemical and neurodevelopmental pathways. This is particularly important given the absence of alternative non-human model systems for studying this specifically human form of communication. Multiple genes and loci have been associated with dyslexia but, as expected for a common complex disorder, none accounts for a majority of cases and causative DNA variants have not been confirmed. We will leverage the large, well-characterized set of families and linkage data we have amassed, coupled with large samples from a new multinational consortium, to identify genes and non-coding regulatory elements (gene-units) associated with component phenotypes of dyslexia. This goal will be accomplished with three specific aims: 1. Comprehensively evaluate and identify variants in gene-units of strong already-proposed dyslexia candidate loci; 2. Discover, refine and prioritize candidate gene-units for dyslexia component phenotypes in the most promising regions of interest identified in our cohort of families by prior genome scans; and 3. Validate the most promising gene-units in additional subject samples. Our proposed project has multiple major novel components. First, we will test a particular model of the genetic architecture of dyslexia by comprehensively analyzing DNA sequence data in focused regions of interest (ROIs) in family-based samples. This will include evaluation of genomic sequence data from molecular inversion probe (MIP) capture of gene-units in large datasets. We will use analysis tools developed in our group to select the minimal number of informative family members to sequence, and combine family-based and population-based imputation to augment the sequence data by populating variants into the rest of the pedigree. This strategy will reduce the multiple test problem and increase power. Second, these analyses will incorporate ENCODE-annotated regulatory elements that may harbor variants that affect gene expression or function in more subtle ways than protein coding variants. The focus on transcribed DNA and Mendelian traits that is typical in human genetic analysis of coding-exon data may miss important variants that affect quantitative, rather than qualitative, variation. Third, we will employ bioinformatics approaches to prioritize genes and regulatory regions that explain the observed phenotypic variation in ROIs. These approaches include gene burden and family-based association methods that simultaneously accommodate rare and common variants as contributors to the component phenotypes. Our multigenerational subject sample and a large subject sample from the consortium provide excellent resources for discovery and validation of relevant variants and genes.