In 2003, a complete draft of the entire human genome was completed. Currently, efforts are underway to extend this sequencing project to larger number of individuals, some of whom are part of studies looking for links between genetic variation and health outcomes. Other projects will sequence only small parts of the genome which are suspected to be the most relevant to diseases. Association studies like these are of importance to NHLBI because they are able to more directly identify causative mutations and genes than previous kinds of genetic association. These studies will run into difficulties with statistical analysis of their results. This fellowship will train the candidate in development of novel techniques to increase the utility of advances in underlying tech- neology for understanding disease. Current methods test each mutant (single nucleotide polymorphism - SNP) for a disproportionate frequency in diseased and normal participants. However, several features of sequence data will make such comparisons uninformative. First, there are a huge number of SNPs which can be detected, around twenty million to date;the multiple testing problems with this many SNPs means that only the strongest associations are detectable. Second, functional SNPs (those changing the protein makeup) come in several varieties, and non-functional SNPs have many potential relationships to the nearby gene. Current methods treat these varieties as equivalent, which ignores some potentially important information. Third, these methods do not take advantage of the clustering of relevant variants near each other. Once you have identified a SNP as being associated with disease, other SNPs in the same gene should be considered more likely than before to be relevant. Fourth, many of the SNPs identified by sequencing will be rare. If a SNP only appears in a few people in a study, the evidence for or against it being important will be weak, and the study will not be able to reach conclusions about most of the variation it detects. We will develop new analysis methods which address these problems. Our methods will use predefined collections of SNPs within and nearby protein coding regions and conserved non-coding regions to look for patterns of association, taking into account the local correlation, SNP frequencies, and predicted SNP type. Testing and perfecting the method on whole genome sequencing will be done with simulation and the initial draft of the 1000 Genomes Project and drug-response phenotypes. Second, we will demonstrate our techniques for candidate gene confirmation using resequencing of 9 genes previously implicated in asthma in 500 normal individuals and 500 asthmatics. 1 PUBLIC HEALTH RELEVANCE: This project will give genetic resequencing projects additional power to detect disease causing mutations. Locating disease causing mutations will allow the creation of effective screening tests and suggest targets for new therapies for common diseases such as asthma.