To help to analyze and understand aging-related complex traits that are affected by many genes and environmental factors, we propose to develop three statistical algorithms for the analyses of genome-wide genotyping and high-throughput sequencing studies. Our proposed new computational tools provide means to analyze additional types of data e.g., to estimate mitochondrial DNA (mtDNA) copy number efficiently from whole-genome sequences, or to identify mtDNA variants from whole-exome sequencing. To test these algorithms, we take advantage of the special features of the SardiNIA project (see Annual Report AG000675), which has collected longitudinal data for >600 quantitative traits together with the whole-genome genetic data in the founder Sardinia population. In the past year, for example, this has involved us in epidemiological analyses for frailty-related traits (walking speed, grip strength and bone density) and hearing capacity as a function of age and sex. In order to conduct analyses on large-scale consortium data to study mtDNA copy number as a critical determinant of mitochondrial function and a potential biomarker for disease, we are developing an ultra-fast program to estimate mtDNA copy number from whole-genome sequencing (WGS) data. Previously we and other groups have shown that the mtDNA copy number per cell can be directly estimated from WGS. The computation is based on the rationale that sequencing coverage should be proportional to the underlying DNA copy number for autosomal and mitochondrial DNA, and most computing time is spent calculating the average autosomal DNA coverage across 3 billion bases. That makes analyzing tens of thousands of available samples very slow. We are developing fastMitoCalc, a program that takes advantage of the indexing of sequencing alignment files and uses a randomly selected small subset (0.1%) of the nuclear genome to estimate autosomal DNA coverage accurately. It is more than 100 times faster than current programs. Consequently, a computer cluster with 50 CPUs can now finish analyzing 10,000 low-pass sequencing samples in about 3 hours rather than the 25 days required originally. Using fastMitoCalc, it becomes much more feasible now to analyze hundreds of thousands of genomes to test for association of mtDNA copy number with quantitative traits or nuclear variants. In order to take advantage of the available large-scale whole-exome sequencing (WES) data sets, we are developing and testing algorithms that can use off-target sequences from WES to identify mtDNA variants. WES technology uses exome capture kits to pre-select the protein-coding DNA regions (targeted regions) of the genome, and then carries out sequencing reactions. Although one might expect that the mtDNA genome would not be substantially covered by the sequencing reactions, multiple studies have shown that off-target mtDNA sequences can be reliably obtained from WES that fully cover the mtDNA genome. We propose to extract all the sequence reads aligned to the mtDNA reference genome, including off-target reads, and then use our program mitoCaller to identify mtDNA variants. To validate the variant calling from WES data, we will take advantage of the individuals in 1000 Genomes Project that were sequenced by both WES and WGS and use the results called from WGS as the standard. We expect very high concordance between the variant genotypes identified by WES and WGS. To investigate and improve the prediction of phenotypes, which is a major goal in personalized medicine, in ongoing work, we are implementing linear mixed models to evaluate the prediction accuracy of a certain phenotype with increasingly more comprehensive genetic data (e.g., from HapMap imputed genotypes and sequencing-based genetic data), together with demographic data (e.g., family structure) and other related phenotypic traits.