Rapid advances in DNA sequencing technology enabled massive identification and cataloging of human allelic variation in research and clinical setting. A key challenge for human genetics today is to identify, among the myriad of alleles, those variants that have an effect on molecular function and phenotypes. We earlier developed computational methods for predicting the functional effect of human mutations and non-synonymous SNPs and implemented these methods in software tools PolyPhen and subsequently PolyPhen-2. We maintain both online and standalone versions of these computational tools in our laboratory. These tools are widely used by geneticists in a variety of research and clinical applications. Explosion of large-scale population sequencing projects greatly increased demand for the prediction methods. These projects also set new requirements for significant improvements of the methods and for tailoring software to specific applications in new technologically advanced human genetics. Specifically, massive exome sequencing projects aiming at identifying genes that harbor rare coding variants involved in human phenotypes require highly accurate, easy to use and fast methods for annotating large numbers of sequence variants. On the other hand, DNA sequencing is rapidly becoming a method of choice in clinical genetic diagnostics. Interpretation of novel sequence variants in human disease genes becomes the major bottleneck in diagnostic analysis of sequencing data. Applications to clinical genetic diagnostics require substantial increase in the accuracy of prediction methods and development of methods that target specific protein groups and generate predictions specific to individual diagnostic tests. The current need in interpretation of sequence variants is paralleled by the opportunity to greatly enhance computational methods and software. Genomes of multiple vertebrates provide a rich resource of information for generating predictions. New statistical approaches are needed to optimally employ these data. Recent increase of the size of databases of human mutations and common SNPs provide much larger training and testing datasets. New methods should be developed to fully benefit from large training and testing data. In Specific Aim 1, we will develop a prediction method guided by the phylogenetic tree that would utilize alignments of vertebrate genomes. We will further incorporate interactions between amino acid positions in the analysis of comparative genomics data to take into account compensatory substitutions. In Specific Aim 2, we will develop a version of PolyPhen software for the analysis of exome or genome sequencing datasets. We will integrate functional predictions in the statistical tests to detect phenotypic association of rare non- synonymous variants. In Specific Aim 3, in close collaboration with clinical geneticists we will test feasibility of developing prediction methods specialized for individual diagnostic tests that would achieve clinically useful levels of specificity and sensitivity! PUBLIC HEALTH RELEVANCE: A key challenge for human genetics today is to identify, among the myriad of alleles discovered by massive DNA sequencing projects, genetic variants that have an effect on molecular function and human disease. We earlier developed widely used software for predicting the functional effect of human alleles. We plan to substantially increase the accuracy of the computational prediction method, adapt the method to the needs of large-scale sequencing projects and specific genetic diagnostic tests.