(1) Detecting and characterizing haplotype-trait associations.[unreadable] [unreadable] This work has been focusing on improving the characterization of[unreadable] haplotype associations with traits by incorporating haplotype-specific[unreadable] variance parameters into the likelihood for genotypic data. The[unreadable] inference proceeds within the likelihood framework that involves[unreadable] simultaneous estimation of haplotypic effects and the haplotype[unreadable] frequencies. The addition of the haplotypic variance is found to[unreadable] improve power of detecting associations under complex models including[unreadable] those where only a subset of functional polymorphisms has been scored,[unreadable] as well as heterogeneity models where multiple mutations are linked to[unreadable] the haplotypes under study via linkage disequilibrium. Association[unreadable] tests and estimation procedures have been developed for un-phased[unreadable] haplotypes, as well as for entire un-phased diplotypes. An overall[unreadable] association test including all of the haplotypes at once has been[unreadable] derived as well. The method was successful in finding a strong[unreadable] association of adrenergic receptor beta-2 (ADRB2) haplotypes with[unreadable] blood pressure.[unreadable] [unreadable] (2) Effect reversal in association studies.[unreadable] [unreadable] Failure to replicate a genetic association is a common problem. It has[unreadable] been observed that the direction of the effect in different studies[unreadable] may be reversed as well. Although an explanation for many of these[unreadable] cases is likely to be statistical in nature, it has been suggested[unreadable] that a reversal of effect (flip-flop) can be a consequence of a change[unreadable] in linkage disequilibrium (LD) between a causal and the observed[unreadable] variants. A more general model has been developed, showing that a[unreadable] flip-flop phenomenon can be completely attributed to a change in LD[unreadable] only in situations when the studied variant is only a proxy marker for[unreadable] unobserved functional variation. More generally, it has been shown[unreadable] that a flip-flop can occur without a change in LD, or even when the LD[unreadable] is zero. Specific conditions has been derived for the form of genetic[unreadable] effects that allow for such flip-flops. In this model, a flip-flop is[unreadable] driven by a shift in population haplotype or allele frequencies, even[unreadable] though both the population prevalence and the allele frequency of the[unreadable] observed variant can be the same in two populations that exhibit a[unreadable] flip-flop. If all relevant variants are scored, a flip-flop can no[unreadable] longer take place, thus it is a consequence of partial knowledge. In[unreadable] the case of a quantitative trait, the unobserved variants induce a[unreadable] difference in the variance of the trait among individuals with[unreadable] different scored alleles. Based on this observation, a statistical[unreadable] approach has been developed for discovering associations. The approach[unreadable] is more robust to loss of power due to a genetic flip-flop, compared[unreadable] to conventional methods.[unreadable] [unreadable] (3) Correlation-based inference for linkage disequilibrium.[unreadable] [unreadable] The correlation between alleles at a pair of genetic loci is a measure[unreadable] of linkage disequilibrium. The square of the sample correlation[unreadable] multiplied by sample size provides the usual test statistic for the[unreadable] hypothesis of no disequilibrium for loci with two alleles and this[unreadable] relation has proved useful for study design and marker[unreadable] selection. Nevertheless, this relation holds only in a di-allelic[unreadable] case, and an extension to multiple alleles has not been made. We[unreadable] studied a similar statistic, R2, which leads to a correlation-based[unreadable] test for loci with multiple alleles. One advantage of this approach is[unreadable] that it can be interpreted as the total correlation between a pair of[unreadable] loci. When the phase of two-locus genotypes is known, the approach is[unreadable] equivalent to a novel test for the overall correlation between rows[unreadable] and columns in a contingency table. In the phase-known case, R2 is the[unreadable] sum of the squared sample correlations for all 2-by-2 subtables formed[unreadable] by collapsing to one allele versus the rest at each locus. We examined[unreadable] the approximate distribution under the null of independence for R2 and[unreadable] found its close agreement with the exact distribution obtained by[unreadable] permutations. The test for independence using R2 is a strong[unreadable] competitor to approaches such as Pearson's chi-square, Fisher's exact[unreadable] test, and a test based on Cressie and Read's power divergence[unreadable] statistic. We combine this approach with previously proposed[unreadable] composite-disequilibrium measures to address the case when the[unreadable] genotypic phase is unknown. Calculation of the new multi-allele test[unreadable] statistic and its p-value are very simple, utilizing the approximate[unreadable] distribution of R2.[unreadable] [unreadable] (4) Combining p-values in large scale genomics experiments.[unreadable] [unreadable] In large-scale genomics experiments involving thousands of statistical[unreadable] tests, such as association scans and microarray expression[unreadable] experiments, a key question is: which of the L tests represent true[unreadable] associations (TAs)? The traditional way to control false findings is[unreadable] via individual adjustments. In the presence of multiple TAs, p-value[unreadable] combination methods offer certain advantages. Both Fisher's and[unreadable] Lancaster's combination methods use an inverse gamma[unreadable] transformation. We identify the relation of the shape parameter of the[unreadable] corresponding distribution to the implicit threshold value; p-values[unreadable] below that threshold are favored by the inverse gamma method (GM). We[unreadable] explore this feature to improve power over Fisher's method when L is[unreadable] large and the number of TAs is moderate. However, the improvement in[unreadable] power provided by combination methods is at the expense of a weaker[unreadable] claim made upon rejection of the null hypothesis that there are some[unreadable] TAs among the L tests. Thus, GM remains a global test. To allow a[unreadable] stronger claim about a subset of p-values that is smaller than L, we[unreadable] investigate two methods with an explicit truncation: the rank[unreadable] truncated product method (RTP) that combines the first K ordered[unreadable] p-values, and the truncated product method (TPM) that combines[unreadable] p-values that are smaller than a specified threshold. We conclude that[unreadable] TPM allows claims to be made about subsets of p-values, while the[unreadable] claim of the RTP is, like GM, more appropriately about all L tests.[unreadable] GM gives somewhat higher power than TPM, RTP, Fisher, and Simes[unreadable] methods across a range of simulations.[unreadable] [unreadable] (5) Ranks of a true association in large scale genetics experiments.[unreadable] [unreadable] In the context of a large collection of statistical genetics tests in[unreadable] which the number of true associations (TAs) is small, we study the[unreadable] distribution of the ranks of TAs among the false associations[unreadable] (FAs). We investigate the relative efficiency of ranking measures and[unreadable] how many best results need to be screened to cover TAs with high[unreadable] probability, using a few different ways of assessing significance and[unreadable] adjusting for multiple testing. This way of looking at the problem can[unreadable] aid in optimally following up on initial significant findings and in[unreadable] planning of future large scale experiments. Genome-wide expression[unreadable] studies are one prominent example, where the number of measured[unreadable] transcription units is in the tens of thousands. Even larger are[unreadable] whole-genome association scans, where the number of tests, L, is now[unreadable] commonly in the hundreds of thousands. The measure of association with[unreadable] a trait of interest could be a p-value, possibly weighted towards[unreadable] effect size. Under a fairly wide set of conditions, we study rank[unreadable] distribution of the p-value from a single TA amongst a large number of[unreadable] FAs. We present the impact of multiple testing adjustments on the[unreadable] rank distributions. This study identifies situations where ranking[unreadable] results by the effect size produces better ranks of TAs than the usual[unreadable] sorting by a test statistic value, or by a p-value.