Project Summary ?Expanded analysis of whole genome sequence data in cleft case-parent trios? Oral clefts include cleft lip and palate (CLP), cleft lip (CL) and cleft palate (CP), and represent the most common group of craniofacial malformations in humans. All oral clefts show strong familial aggregation, and all have a complex and heterogeneous etiology. At least a dozen different genes have been confirmed to influence risk through genome-wide association studies and linkage scans. This R03 application to expand our analysis of whole genome sequencing (WGS) data on 415 case-parent trios supported by the Kids First project (X01-HL132363 ?Kids First: Genomic Studies of Orofacial Cleft Birth Defects?; Marazita and Feingold, multiple-PIs; Univ. of Pittsburgh) is part of the Gabriella Miller Kids First Pediatric Research Program, a trans- NIH effort currently focused on gene discovery for pediatric cancers and structural birth defects (https://commonfund.nih.gov/KidsFirst). The Johns Hopkins group contributed samples to this Kids First project, and we will be actively collaborating with Dr. Marazita and the Univ. of Pittsburgh group in analyzing these forthcoming WGS data. Support from this R03 application will allow us to perform focused analyses of these WGS data, while developing efficient new analytical tools that will be immediately useful to all other Kids First groups. This project includes the following specific aims: Aim 1) to develop methods for using external functional information (including predictive scores and expression levels from outside resources) for intergenic regions where strong statistical evidence of linkage and association is detected, e.g. the chr. 8q24 gene desert region; Aim 2) to compare WGS data from the European ancestry trios in Kids First with available targeted sequencing data from European and Asian ancestry trios to document haplotype diversity using common and low frequency variants in regions showing differences in signal across populations; Aim 3) to enhance software available for analysis of case-parent trios by expanding the capabilities of the existing R package trio, which is now part of the Bioconductor computing platform, a freely available computing resource for statistical analysis of genomic data. These improvements to this analytical tool will be immediately useful to all research groups in the Kids First initiative, as well as outside users.